Update README.md

7e72fc9 verified 2 months ago

3.77 kB

metadata

library_name: transformers
tags:
  - cybersecurity
  - mpnet
  - embeddings
  - classification
language:
  - en
base_model:
  - microsoft/mpnet-base

MPNet (Cyber) - MPNet Fine-Tuned for Cybersecurity Group Classification

This MPNet model was fine-tuned specifically for classifying cybersecurity threat groups based on textual descriptions from cybersecurity reports.

Model Details

Model Description

This model is based on microsoft/mpnet-base and fine-tuned using Masked Language Modeling (MLM) and supervised classification on cybersecurity threat intelligence descriptions, primarily focused on known threat actor groups.

Model Information

Base Model: microsoft/mpnet-base
Tasks: Text classification, embedding generation
Language: English

Intended Use

Primary Use

This model generates specialized embeddings that are useful for:

Identifying cybersecurity threat actor groups from textual descriptions
Cybersecurity threat intelligence analysis
Embedding-based retrieval tasks in cybersecurity contexts

Out-of-Scope Use

This model is not intended for general language tasks outside cybersecurity contexts.

Performance Evaluation

The model was benchmarked against state-of-the-art cybersecurity NLP models:

Model	Classification Accuracy	Embedding Variability
Original MPNet	55.73%	0.0798
SecBERT	91.67%	0.5911
ATTACK-BERT	83.51%	0.0960
MPNet (Cyber)	72.74%	0.1239
SecureBERT	49.31%	0.0071

Downstream Tasks

Attribution of cybersecurity incidents
Automated analysis of threat intelligence reports
Embeddings for cybersecurity threat detection

Limitations

Best suited for English language cybersecurity contexts
May require further fine-tuning for highly specific tasks

Usage

To use this model:

from transformers import AutoTokenizer, MPNetModel
import torch

tokenizer = AutoTokenizer.from_pretrained("selfconstruct3d
/
mpnet-classification-finetuned-cyber-groups ")
model = MPNetModel.from_pretrained("selfconstruct3d
/
mpnet-classification-finetuned-cyber-groups ")

inputs = tokenizer("APT38 uses ransomware for financial gains.", return_tensors="pt")
outputs = model(**inputs)
embeddings = outputs.last_hidden_state.mean(dim=1)

from sentence_transformers import SentenceTransformer
sentences = ["This is an example sentence", "Each sentence is converted"]

model = SentenceTransformer('selfconstruct3d/mpnet-classification-finetuned-cyber-groups')
embeddings = model.encode(sentences)
print(embeddings)

Training Details

Training Data

Fine-tuned on descriptions of threat actor activities sourced from cybersecurity reports, including MITRE ATT&CK techniques.

Hyperparameters

Epochs: 10 (MLM), 20 (classification)
Batch size: 16
Learning rate: 5e-6 (MLM), 2e-6 (classification)
Hardware: GPU (CUDA-enabled)

Citation

If using this model, please cite as:

@misc{mpnet_cyber_finetune,
  author = {Hamzic, D.},
  title = {MPNet Fine-Tuned for Cybersecurity Group Classification},
  year = {2025},
  publisher = {Hugging Face},
  url = {https://huggingface.co/selfconstruct3d/mpnet-classification-finetuned-cyber-groups}
}

Contact

Author: Dženan Hamzić
Contact Information: https://www.linkedin.com/in/dzenan-hamzic/

Licence

This model is licensed for non-commercial use only (CC BY-NC 4.0). For commercial inquiries, please contact [email protected].