--- library_name: transformers tags: - cybersecurity - mpnet - embeddings - classification license: apache-2.0 language: - en base_model: - microsoft/mpnet-base --- # MPNet (Cyber) - MPNet Fine-Tuned for Cybersecurity Group Classification This MPNet model was fine-tuned specifically for classifying cybersecurity threat groups based on textual descriptions from cybersecurity reports. ## Model Details ### Model Description This model is based on `microsoft/mpnet-base` and fine-tuned using Masked Language Modeling (MLM) and supervised classification on cybersecurity threat intelligence descriptions, primarily focused on known threat actor groups. ### Model Information - **Base Model:** microsoft/mpnet-base - **Tasks:** Text classification, embedding generation - **Language:** English ## Intended Use ### Primary Use This model generates specialized embeddings that are useful for: - Identifying cybersecurity threat actor groups from textual descriptions - Cybersecurity threat intelligence analysis - Embedding-based retrieval tasks in cybersecurity contexts ### Out-of-Scope Use This model is not intended for general language tasks outside cybersecurity contexts. ## Performance Evaluation The model was benchmarked against state-of-the-art cybersecurity NLP models: | Model | Classification Accuracy | Embedding Variability | |------------------|-------------------------|-----------------------| | Original MPNet | 55.73% | 0.0798 | | SecBERT | 91.67% | 0.5911 | | ATTACK-BERT | 83.51% | 0.0960 | | MPNet (Cyber) | 72.74% | 0.1239 | | SecureBERT | 49.31% | 0.0071 | ### Downstream Tasks - Attribution of cybersecurity incidents - Automated analysis of threat intelligence reports - Embeddings for cybersecurity threat detection ### Limitations - Best suited for English language cybersecurity contexts - May require further fine-tuning for highly specific tasks ## Usage To use this model: ```python from transformers import AutoTokenizer, MPNetModel import torch tokenizer = AutoTokenizer.from_pretrained("selfconstruct3d / mpnet-classification-finetuned-cyber-groups ") model = MPNetModel.from_pretrained("selfconstruct3d / mpnet-classification-finetuned-cyber-groups ") inputs = tokenizer("APT38 uses ransomware for financial gains.", return_tensors="pt") outputs = model(**inputs) embeddings = outputs.last_hidden_state.mean(dim=1) ``` or ```python from sentence_transformers import SentenceTransformer sentences = ["This is an example sentence", "Each sentence is converted"] model = SentenceTransformer('selfconstruct3d/mpnet-classification-finetuned-cyber-groups') embeddings = model.encode(sentences) print(embeddings) ``` ## Training Details ### Training Data Fine-tuned on descriptions of threat actor activities sourced from cybersecurity reports, including MITRE ATT&CK techniques. ### Hyperparameters - **Epochs:** 10 (MLM), 20 (classification) - **Batch size:** 16 - **Learning rate:** 5e-6 (MLM), 2e-6 (classification) - **Hardware:** GPU (CUDA-enabled) ## Citation If using this model, please cite as: ```bibtex @misc{mpnet_cyber_finetune, author = {Hamzic, D.}, title = {MPNet Fine-Tuned for Cybersecurity Group Classification}, year = {2025}, publisher = {Hugging Face}, url = {https://huggingface.co/selfconstruct3d/mpnet-classification-finetuned-cyber-groups} } ``` ## Contact - **Author:** Dženan Hamzić - **Contact Information:** https://www.linkedin.com/in/dzenan-hamzic/