File size: 3,648 Bytes
bab2155 7e07b6e bab2155 7e07b6e bab2155 7e07b6e bab2155 7e07b6e bab2155 7e07b6e bab2155 7e07b6e bab2155 7e07b6e bab2155 7e07b6e bab2155 7e07b6e bab2155 7e07b6e bab2155 7e07b6e bab2155 7e07b6e bab2155 7e07b6e bab2155 7e07b6e bab2155 7e07b6e bab2155 7e07b6e bab2155 7e07b6e bab2155 7e07b6e bab2155 7e07b6e bab2155 4ceccf2 bab2155 7e07b6e bab2155 7e07b6e bab2155 7e07b6e bab2155 7e07b6e bab2155 7e07b6e 7316634 7e07b6e 7316634 7e07b6e bab2155 7e07b6e 7316634 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 |
---
library_name: transformers
tags:
- cybersecurity
- mpnet
- embeddings
- classification
license: apache-2.0
language:
- en
base_model:
- microsoft/mpnet-base
---
# MPNet (Cyber) - MPNet Fine-Tuned for Cybersecurity Group Classification
This MPNet model was fine-tuned specifically for classifying cybersecurity threat groups based on textual descriptions from cybersecurity reports.
## Model Details
### Model Description
This model is based on `microsoft/mpnet-base` and fine-tuned using Masked Language Modeling (MLM) and supervised classification on cybersecurity threat intelligence descriptions, primarily focused on known threat actor groups.
### Model Information
- **Base Model:** microsoft/mpnet-base
- **Tasks:** Text classification, embedding generation
- **Language:** English
## Intended Use
### Primary Use
This model generates specialized embeddings that are useful for:
- Identifying cybersecurity threat actor groups from textual descriptions
- Cybersecurity threat intelligence analysis
- Embedding-based retrieval tasks in cybersecurity contexts
### Out-of-Scope Use
This model is not intended for general language tasks outside cybersecurity contexts.
## Performance Evaluation
The model was benchmarked against state-of-the-art cybersecurity NLP models:
| Model | Classification Accuracy | Embedding Variability |
|------------------|-------------------------|-----------------------|
| Original MPNet | 55.73% | 0.0798 |
| SecBERT | 91.67% | 0.5911 |
| ATTACK-BERT | 83.51% | 0.0960 |
| MPNet (Cyber) | 72.74% | 0.1239 |
| SecureBERT | 49.31% | 0.0071 |
### Downstream Tasks
- Attribution of cybersecurity incidents
- Automated analysis of threat intelligence reports
- Embeddings for cybersecurity threat detection
### Limitations
- Best suited for English language cybersecurity contexts
- May require further fine-tuning for highly specific tasks
## Usage
To use this model:
```python
from transformers import AutoTokenizer, MPNetModel
import torch
tokenizer = AutoTokenizer.from_pretrained("selfconstruct3d
/
mpnet-classification-finetuned-cyber-groups ")
model = MPNetModel.from_pretrained("selfconstruct3d
/
mpnet-classification-finetuned-cyber-groups ")
inputs = tokenizer("APT38 uses ransomware for financial gains.", return_tensors="pt")
outputs = model(**inputs)
embeddings = outputs.last_hidden_state.mean(dim=1)
```
or
```python
from sentence_transformers import SentenceTransformer
sentences = ["This is an example sentence", "Each sentence is converted"]
model = SentenceTransformer('selfconstruct3d/mpnet-classification-finetuned-cyber-groups')
embeddings = model.encode(sentences)
print(embeddings)
```
## Training Details
### Training Data
Fine-tuned on descriptions of threat actor activities sourced from cybersecurity reports, including MITRE ATT&CK techniques.
### Hyperparameters
- **Epochs:** 10 (MLM), 20 (classification)
- **Batch size:** 16
- **Learning rate:** 5e-6 (MLM), 2e-6 (classification)
- **Hardware:** GPU (CUDA-enabled)
## Citation
If using this model, please cite as:
```bibtex
@misc{mpnet_cyber_finetune,
author = {Hamzic, D.},
title = {MPNet Fine-Tuned for Cybersecurity Group Classification},
year = {2025},
publisher = {Hugging Face},
url = {https://huggingface.co/selfconstruct3d/mpnet-classification-finetuned-cyber-groups}
}
```
## Contact
- **Author:** Dženan Hamzić
- **Contact Information:** https://www.linkedin.com/in/dzenan-hamzic/ |