selfconstruct3d's picture
Update README.md
4ceccf2 verified
metadata
library_name: transformers
tags:
  - cybersecurity
  - mpnet
  - embeddings
  - classification
license: apache-2.0
language:
  - en
base_model:
  - microsoft/mpnet-base

MPNet (Cyber) - MPNet Fine-Tuned for Cybersecurity Group Classification

This MPNet model was fine-tuned specifically for classifying cybersecurity threat groups based on textual descriptions from cybersecurity reports.

Model Details

Model Description

This model is based on microsoft/mpnet-base and fine-tuned using Masked Language Modeling (MLM) and supervised classification on cybersecurity threat intelligence descriptions, primarily focused on known threat actor groups.

Model Information

  • Base Model: microsoft/mpnet-base
  • Tasks: Text classification, embedding generation
  • Language: English

Intended Use

Primary Use

This model generates specialized embeddings that are useful for:

  • Identifying cybersecurity threat actor groups from textual descriptions
  • Cybersecurity threat intelligence analysis
  • Embedding-based retrieval tasks in cybersecurity contexts

Out-of-Scope Use

This model is not intended for general language tasks outside cybersecurity contexts.

Performance Evaluation

The model was benchmarked against state-of-the-art cybersecurity NLP models:

Model Classification Accuracy Embedding Variability
Original MPNet 55.73% 0.0798
SecBERT 91.67% 0.5911
ATTACK-BERT 83.51% 0.0960
MPNet (Cyber) 72.74% 0.1239
SecureBERT 49.31% 0.0071

Downstream Tasks

  • Attribution of cybersecurity incidents
  • Automated analysis of threat intelligence reports
  • Embeddings for cybersecurity threat detection

Limitations

  • Best suited for English language cybersecurity contexts
  • May require further fine-tuning for highly specific tasks

Usage

To use this model:

from transformers import AutoTokenizer, MPNetModel
import torch

tokenizer = AutoTokenizer.from_pretrained("selfconstruct3d
/
mpnet-classification-finetuned-cyber-groups ")
model = MPNetModel.from_pretrained("selfconstruct3d
/
mpnet-classification-finetuned-cyber-groups ")

inputs = tokenizer("APT38 uses ransomware for financial gains.", return_tensors="pt")
outputs = model(**inputs)
embeddings = outputs.last_hidden_state.mean(dim=1)

or

from sentence_transformers import SentenceTransformer
sentences = ["This is an example sentence", "Each sentence is converted"]

model = SentenceTransformer('selfconstruct3d/mpnet-classification-finetuned-cyber-groups')
embeddings = model.encode(sentences)
print(embeddings)

Training Details

Training Data

Fine-tuned on descriptions of threat actor activities sourced from cybersecurity reports, including MITRE ATT&CK techniques.

Hyperparameters

  • Epochs: 10 (MLM), 20 (classification)
  • Batch size: 16
  • Learning rate: 5e-6 (MLM), 2e-6 (classification)
  • Hardware: GPU (CUDA-enabled)

Citation

If using this model, please cite as:

@misc{mpnet_cyber_finetune,
  author = {Hamzic, D.},
  title = {MPNet Fine-Tuned for Cybersecurity Group Classification},
  year = {2025},
  publisher = {Hugging Face},
  url = {https://huggingface.co/selfconstruct3d/mpnet-classification-finetuned-cyber-groups}
}

Contact