library_name: transformers
tags:
- cybersecurity
- mpnet
- embeddings
- classification
license: apache-2.0
language:
- en
base_model:
- microsoft/mpnet-base
MPNet (Cyber) - MPNet Fine-Tuned for Cybersecurity Group Classification
This MPNet model was fine-tuned specifically for classifying cybersecurity threat groups based on textual descriptions from cybersecurity reports.
Model Details
Model Description
This model is based on microsoft/mpnet-base
and fine-tuned using Masked Language Modeling (MLM) and supervised classification on cybersecurity threat intelligence descriptions, primarily focused on known threat actor groups.
Model Information
- Base Model: microsoft/mpnet-base
- Tasks: Text classification, embedding generation
- Language: English
Intended Use
Primary Use
This model generates specialized embeddings that are useful for:
- Identifying cybersecurity threat actor groups from textual descriptions
- Cybersecurity threat intelligence analysis
- Embedding-based retrieval tasks in cybersecurity contexts
Out-of-Scope Use
This model is not intended for general language tasks outside cybersecurity contexts.
Performance Evaluation
The model was benchmarked against state-of-the-art cybersecurity NLP models:
Model | Classification Accuracy | Embedding Variability |
---|---|---|
Original MPNet | 55.73% | 0.0798 |
SecBERT | 91.67% | 0.5911 |
ATTACK-BERT | 83.51% | 0.0960 |
MPNet (Cyber) | 72.74% | 0.1239 |
SecureBERT | 49.31% | 0.0071 |
Downstream Tasks
- Attribution of cybersecurity incidents
- Automated analysis of threat intelligence reports
- Embeddings for cybersecurity threat detection
Limitations
- Best suited for English language cybersecurity contexts
- May require further fine-tuning for highly specific tasks
Usage
To use this model:
from transformers import AutoTokenizer, MPNetModel
import torch
tokenizer = AutoTokenizer.from_pretrained("selfconstruct3d
/
mpnet-classification-finetuned-cyber-groups ")
model = MPNetModel.from_pretrained("selfconstruct3d
/
mpnet-classification-finetuned-cyber-groups ")
inputs = tokenizer("APT38 uses ransomware for financial gains.", return_tensors="pt")
outputs = model(**inputs)
embeddings = outputs.last_hidden_state.mean(dim=1)
or
from sentence_transformers import SentenceTransformer
sentences = ["This is an example sentence", "Each sentence is converted"]
model = SentenceTransformer('selfconstruct3d/mpnet-classification-finetuned-cyber-groups')
embeddings = model.encode(sentences)
print(embeddings)
Training Details
Training Data
Fine-tuned on descriptions of threat actor activities sourced from cybersecurity reports, including MITRE ATT&CK techniques.
Hyperparameters
- Epochs: 10 (MLM), 20 (classification)
- Batch size: 16
- Learning rate: 5e-6 (MLM), 2e-6 (classification)
- Hardware: GPU (CUDA-enabled)
Citation
If using this model, please cite as:
@misc{mpnet_cyber_finetune,
author = {Hamzic, D.},
title = {MPNet Fine-Tuned for Cybersecurity Group Classification},
year = {2025},
publisher = {Hugging Face},
url = {https://huggingface.co/selfconstruct3d/mpnet-classification-finetuned-cyber-groups}
}
Contact
- Author: Dženan Hamzić
- Contact Information: https://www.linkedin.com/in/dzenan-hamzic/