selfconstruct3d
/

cybersec_classifier

Model card Files Files and versions Community

selfconstruct3d commited on 17 days ago

Commit

8188807

·

verified ·

1 Parent(s): c0565d2

Create README.md

Files changed (1) hide show

README.md +59 -0

README.md ADDED Viewed

	@@ -0,0 +1,59 @@

+---
+license: apache-2.0
+language:
+- en
+- de
+---
+# 🛡️ MLP Cybersecurity Classifier
+This repository hosts a lightweight `scikit-learn`-based MLP classifier trained to distinguish cybersecurity-related content from other text, using sentence-transformer embeddings. It supports English and German input texts.
+## 📦 Model Details
+- **Architecture**: `MLPClassifier` with hidden layers `(128, 64)`
+- **Embedding model**: [`intfloat/multilingual-e5-large`](https://huggingface.co/intfloat/multilingual-e5-large)
+- **Input**: Cleaned article (removed stopwords) or report text
+- **Output**: Binary label (e.g., `Cybersecurity`, `Not Cybersecurity`)
+- **Languages**: English, German
+## 🔧 Usage
+```python
+from sentence_transformers import SentenceTransformer
+from sklearn.model_selection import train_test_split
+from sklearn.preprocessing import LabelEncoder
+import pandas as pd
+import joblib
+from huggingface_hub import hf_hub_download
+# Load your cleaned dataset
+df = pd.read_csv("your_dataset.csv")  # Requires 'clean_text' and 'label' columns
+# Load the sentence transformer
+embedder = SentenceTransformer("intfloat/multilingual-e5-large")
+# Train-test split
+X_train, X_test, y_train, y_test = train_test_split(
+    df["clean_text"],
+    df["label"],
+    test_size=0.05,
+    stratify=df["label"],
+    random_state=42
+)
+# Encode labels
+label_encoder = LabelEncoder()
+y_train_enc = label_encoder.fit_transform(y_train)
+y_test_enc = label_encoder.transform(y_test)
+# Generate sentence embeddings
+X_train_emb = embedder.encode(X_train.tolist(), convert_to_numpy=True, show_progress_bar=True)
+X_test_emb = embedder.encode(X_test.tolist(), convert_to_numpy=True, show_progress_bar=True)
+# Load the trained classifier
+model_path = hf_hub_download(repo_id="your-selfconstruct3d/cybersec-classifier", filename="cybersec_classifier.pkl")
+model = joblib.load(model_path)
+# Predict
+y_pred = model.predict(X_test_emb)