---
library_name: transformers
tags:
- cybersecurity
- mpnet
- embeddings
- classification
license: apache-2.0
language:
- en
base_model:
- microsoft/mpnet-base
---

# MPNet (Cyber) - MPNet Fine-Tuned for Cybersecurity Group Classification 

This MPNet model was fine-tuned specifically for classifying cybersecurity threat groups based on textual descriptions from cybersecurity reports.

## Model Details

### Model Description

This model is based on `microsoft/mpnet-base` and fine-tuned using Masked Language Modeling (MLM) and supervised classification on cybersecurity threat intelligence descriptions, primarily focused on known threat actor groups.

### Model Information
- **Base Model:** microsoft/mpnet-base
- **Tasks:** Text classification, embedding generation
- **Language:** English

## Intended Use

### Primary Use

This model generates specialized embeddings that are useful for:
- Identifying cybersecurity threat actor groups from textual descriptions
- Cybersecurity threat intelligence analysis
- Embedding-based retrieval tasks in cybersecurity contexts

### Out-of-Scope Use

This model is not intended for general language tasks outside cybersecurity contexts.

## Performance Evaluation

The model was benchmarked against state-of-the-art cybersecurity NLP models:

| Model            | Classification Accuracy | Embedding Variability |
|------------------|-------------------------|-----------------------|
| Original MPNet   | 55.73%                  | 0.0798                |
| SecBERT          | 91.67%                  | 0.5911                |
| ATTACK-BERT      | 83.51%                  | 0.0960                |
| MPNet (Cyber)    | 72.74%                  | 0.1239                |
| SecureBERT       | 49.31%                  | 0.0071                |

### Downstream Tasks
- Attribution of cybersecurity incidents
- Automated analysis of threat intelligence reports
- Embeddings for cybersecurity threat detection

### Limitations
- Best suited for English language cybersecurity contexts
- May require further fine-tuning for highly specific tasks

## Usage

To use this model:

```python
from transformers import AutoTokenizer, MPNetModel
import torch

tokenizer = AutoTokenizer.from_pretrained("selfconstruct3d
/
mpnet-classification-finetuned-cyber-groups ")
model = MPNetModel.from_pretrained("selfconstruct3d
/
mpnet-classification-finetuned-cyber-groups ")

inputs = tokenizer("APT38 uses ransomware for financial gains.", return_tensors="pt")
outputs = model(**inputs)
embeddings = outputs.last_hidden_state.mean(dim=1)
```

or

```python
from sentence_transformers import SentenceTransformer
sentences = ["This is an example sentence", "Each sentence is converted"]

model = SentenceTransformer('selfconstruct3d/mpnet-classification-finetuned-cyber-groups')
embeddings = model.encode(sentences)
print(embeddings)
```

## Training Details

### Training Data

Fine-tuned on descriptions of threat actor activities sourced from cybersecurity reports, including MITRE ATT&CK techniques.

### Hyperparameters
- **Epochs:** 10 (MLM), 20 (classification)
- **Batch size:** 16
- **Learning rate:** 5e-6 (MLM), 2e-6 (classification)
- **Hardware:** GPU (CUDA-enabled)

## Citation

If using this model, please cite as:

```bibtex
@misc{mpnet_cyber_finetune,
  author = {Hamzic, D.},
  title = {MPNet Fine-Tuned for Cybersecurity Group Classification},
  year = {2025},
  publisher = {Hugging Face},
  url = {https://huggingface.co/selfconstruct3d/mpnet-classification-finetuned-cyber-groups}
}
```

## Contact
- **Author:** Dženan Hamzić
- **Contact Information:** https://www.linkedin.com/in/dzenan-hamzic/