AbuSayed1
/

DGA-Meta-Llama-3-8B-FineTunnedModel

Model card Files Files and versions Community

DGA-Meta-Llama-3-8B-FineTunnedModel / README.md

AbuSayed1's picture

Update README.md

a0bfda8 verified 4 months ago

|

history blame contribute delete

1.92 kB

	---
	base_model: meta-llama/Meta-Llama-3-8B
	library_name: peft
	license: mit
	language:
	- en
	metrics:
	- accuracy
	- precision
	- recall
	- f1
	datasets:
	- AbuSayed1/DGA-DATASET
	---

	## Llama3 8B Fine-Tuned for Domain Generation Algorithm Detection

	This model is a fine-tuned adaptation of Meta's Llama3 8B, tailored for detecting Domain Generation Algorithms (DGAs). DGAs, commonly employed by malware, generate dynamic domain names for command-and-control (C&C) servers, posing a significant challenge in cybersecurity.

	## Model Description

	- Base Model: Llama3 8B
	- Task: DGA Detection
	- Fine-Tuning Approach: Qlora based supervised Fine-Tuning (SFT) with domain-specific data.
	- Dataset: A combined dataset comprising 59 malware families and legitimate domains consisting of three different sources: one dataset for benign domains, which includes the Alexa Top 1 Million Sites collection of reputable domains; and two datasets for DGA domains, sourced from Bambenek Consulting’s malicious algorithmically-generated domains and the 360 Lab DGA Domains.
	- Performance:
	- Accuracy: Known domain 98.6%, Unknown domain 80-99.5%
	- Excels in detecting unknown DGAs.

	This model leverages the extensive semantic understanding of Llama3 to classify domains as either malicious (DGA-generated) or legitimate with high precision and recall.

	## Data

	The model was trained with 2.5 million domains, split between 1.5 million DGA domains and 1 million normal domains.

	Dataset Link: https://huggingface.co/datasets/AbuSayed1/DGA-DATASET

	The GitHub repository https://github.com/MDABUSAYED/StratosphereLinuxIPS/tree/LLM/modules/flowalerts/LLM describe how the model was trained and evaluated.


	## Citation


	APA:

	Sayed, M. A., Rahman, A., Kiekintveld, C., & Garcia, S. (2024). Fine-tuning Large Language Models for DGA and DNS Exfiltration Detection. arXiv preprint arXiv:2410.21723.