File size: 3,648 Bytes
bab2155
 
7e07b6e
 
 
 
 
 
 
 
 
 
bab2155
 
7e07b6e
bab2155
7e07b6e
bab2155
 
 
 
 
7e07b6e
bab2155
7e07b6e
 
 
 
bab2155
7e07b6e
bab2155
7e07b6e
bab2155
7e07b6e
 
 
 
bab2155
 
 
7e07b6e
bab2155
7e07b6e
bab2155
7e07b6e
bab2155
7e07b6e
 
 
 
 
 
 
bab2155
7e07b6e
 
 
 
bab2155
7e07b6e
 
 
bab2155
7e07b6e
bab2155
7e07b6e
bab2155
7e07b6e
 
 
bab2155
7e07b6e
 
 
 
 
 
bab2155
7e07b6e
 
 
 
bab2155
4ceccf2
 
 
 
 
 
 
 
 
 
 
bab2155
 
 
 
7e07b6e
bab2155
7e07b6e
 
 
 
 
bab2155
7e07b6e
bab2155
7e07b6e
bab2155
7e07b6e
 
7316634
7e07b6e
7316634
7e07b6e
 
 
 
bab2155
7e07b6e
7316634
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
---
library_name: transformers
tags:
- cybersecurity
- mpnet
- embeddings
- classification
license: apache-2.0
language:
- en
base_model:
- microsoft/mpnet-base
---

# MPNet (Cyber) - MPNet Fine-Tuned for Cybersecurity Group Classification 

This MPNet model was fine-tuned specifically for classifying cybersecurity threat groups based on textual descriptions from cybersecurity reports.

## Model Details

### Model Description

This model is based on `microsoft/mpnet-base` and fine-tuned using Masked Language Modeling (MLM) and supervised classification on cybersecurity threat intelligence descriptions, primarily focused on known threat actor groups.

### Model Information
- **Base Model:** microsoft/mpnet-base
- **Tasks:** Text classification, embedding generation
- **Language:** English

## Intended Use

### Primary Use

This model generates specialized embeddings that are useful for:
- Identifying cybersecurity threat actor groups from textual descriptions
- Cybersecurity threat intelligence analysis
- Embedding-based retrieval tasks in cybersecurity contexts

### Out-of-Scope Use

This model is not intended for general language tasks outside cybersecurity contexts.

## Performance Evaluation

The model was benchmarked against state-of-the-art cybersecurity NLP models:

| Model            | Classification Accuracy | Embedding Variability |
|------------------|-------------------------|-----------------------|
| Original MPNet   | 55.73%                  | 0.0798                |
| SecBERT          | 91.67%                  | 0.5911                |
| ATTACK-BERT      | 83.51%                  | 0.0960                |
| MPNet (Cyber)    | 72.74%                  | 0.1239                |
| SecureBERT       | 49.31%                  | 0.0071                |

### Downstream Tasks
- Attribution of cybersecurity incidents
- Automated analysis of threat intelligence reports
- Embeddings for cybersecurity threat detection

### Limitations
- Best suited for English language cybersecurity contexts
- May require further fine-tuning for highly specific tasks

## Usage

To use this model:

```python
from transformers import AutoTokenizer, MPNetModel
import torch

tokenizer = AutoTokenizer.from_pretrained("selfconstruct3d
/
mpnet-classification-finetuned-cyber-groups ")
model = MPNetModel.from_pretrained("selfconstruct3d
/
mpnet-classification-finetuned-cyber-groups ")

inputs = tokenizer("APT38 uses ransomware for financial gains.", return_tensors="pt")
outputs = model(**inputs)
embeddings = outputs.last_hidden_state.mean(dim=1)
```

or

```python
from sentence_transformers import SentenceTransformer
sentences = ["This is an example sentence", "Each sentence is converted"]

model = SentenceTransformer('selfconstruct3d/mpnet-classification-finetuned-cyber-groups')
embeddings = model.encode(sentences)
print(embeddings)
```

## Training Details

### Training Data

Fine-tuned on descriptions of threat actor activities sourced from cybersecurity reports, including MITRE ATT&CK techniques.

### Hyperparameters
- **Epochs:** 10 (MLM), 20 (classification)
- **Batch size:** 16
- **Learning rate:** 5e-6 (MLM), 2e-6 (classification)
- **Hardware:** GPU (CUDA-enabled)

## Citation

If using this model, please cite as:

```bibtex
@misc{mpnet_cyber_finetune,
  author = {Hamzic, D.},
  title = {MPNet Fine-Tuned for Cybersecurity Group Classification},
  year = {2025},
  publisher = {Hugging Face},
  url = {https://huggingface.co/selfconstruct3d/mpnet-classification-finetuned-cyber-groups}
}
```

## Contact
- **Author:** Dženan Hamzić
- **Contact Information:** https://www.linkedin.com/in/dzenan-hamzic/