Llama3 8B Fine-Tuned for Domain Generation Algorithm Detection

This model is a fine-tuned version of Meta's Llama3 8B, specifically adapted for detecting Domain Generation Algorithms (DGAs). DGAs are often used by malware to create dynamic domain names for command-and-control (C&C) servers, making them a critical challenge in cybersecurity.

Model Description

  • Base Model: Llama3 8B
  • Task: DGA Detection
  • Fine-Tuning Approach: Supervised Fine-Tuning (SFT) with domain-specific data.
  • Dataset: A custom dataset comprising 68 malware families and legitimate domains from the Tranco dataset, with a focus on both arithmetic and word-based DGAs.
  • Performance:
    • Accuracy: 94%
    • False Positive Rate (FPR): 4%
    • Excels in detecting hard-to-identify word-based DGAs.

This model leverages the extensive semantic understanding of Llama3 to classify domains as either malicious (DGA-generated) or legitimate with high precision and recall.

Data

The model was trained with 2 million domains, split between 1 million DGA domains and 1 million normal domains. The training data is stored in the file train_2M.csv. The model was evaluated with the family files located in the Families_Test folder.

The GitHub repository https://github.com/reypapin/Domain-Name-Classification-with-LLM contains the notebooks that describe how the model was trained and evaluated.

Article Reference

La O, R. L., Catania, C. A., & Parlanti, T. (2024). LLMs for Domain Generation Algorithm Detection. arXiv preprint arXiv:2411.03307.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.

Model tree for Reynier/Llama3_8B-DGA-Detector

Finetuned
(394)
this model