r3ddkahili commited on
Commit
9fe4157
Β·
verified Β·
1 Parent(s): 84aa01c

Upload README.md

Browse files
Files changed (1) hide show
  1. README.md +30 -55
README.md CHANGED
@@ -1,3 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
1
  # Malicious URL Detection Model
2
 
3
  > A fine-tuned **BERT-LoRA** model for detecting malicious URLs, including phishing, malware, and defacement threats.
@@ -8,10 +19,10 @@ This model is a **fine-tuned BERT-based classifier** designed to detect **malici
8
 
9
  The model classifies URLs into **four categories**:
10
 
11
- - βœ… **Benign**
12
- - πŸ”΄ **Defacement**
13
- - ⚠️ **Phishing**
14
- - πŸ›‘ **Malware**
15
 
16
  It achieves **98% validation accuracy** and an **F1-score of 0.965**, ensuring robust detection capabilities.
17
 
@@ -21,11 +32,16 @@ It achieves **98% validation accuracy** and an **F1-score of 0.965**, ensuring r
21
 
22
  ### Use Cases
23
 
24
- βœ”οΈ Real-time URL classification for cybersecurity toolsβœ”οΈ Phishing and malware detection for online safetyβœ”οΈ Integration into browser extensions for instant threat alertsβœ”οΈ Security monitoring for SOC (Security Operations Centers)
 
 
 
25
 
26
  ### Limitations
27
 
28
- ⚠️ May **misclassify short or obfuscated URLs**⚠️ Performance may degrade with **dynamic domain structures**⚠️ Requires **frequent retraining** to adapt to evolving threats
 
 
29
 
30
  ---
31
 
@@ -104,17 +120,17 @@ print(f"Prediction: {label_map[prediction]}")
104
 
105
  ## Deployment Options
106
 
107
- ### 1️⃣ Streamlit Web App
108
 
109
  - Deployed on **Streamlit Cloud, AWS, or Google Cloud**.
110
  - Provides **real-time URL analysis** with a user-friendly interface.
111
 
112
- ### 2️⃣ Browser Extension (Planned)
113
 
114
  - **Real-time scanning** of visited web pages.
115
  - **Dynamic threat alerts** with confidence scores.
116
 
117
- ### 3️⃣ API Integration
118
 
119
  - REST API for bulk URL analysis.
120
  - Supports **Security Operations Centers (SOC)**.
@@ -133,17 +149,17 @@ print(f"Prediction: {label_map[prediction]}")
133
 
134
  ### Data Source
135
 
136
- Dataset sourced from **Kaggle Malicious URLs Dataset**:πŸ“Œ [Dataset Link](https://www.kaggle.com/datasets/sid321axn/malicious-urls-dataset)
 
137
 
138
  ### BibTeX Citation
139
 
140
  ```
141
  @article{maliciousurl2025,
142
- author = {Your Name},
143
  title = {Fine-Tuned BERT for Malicious URL Detection},
144
  year = {2025},
145
- journal = {Cybersecurity AI Research},
146
- url = {https://your-research-paper-link.com}
147
  }
148
  ```
149
 
@@ -151,7 +167,7 @@ Dataset sourced from **Kaggle Malicious URLs Dataset**:πŸ“Œ [Dataset Link](https
151
 
152
  ## Future Work
153
 
154
- πŸš€ **Improvements Planned:**
155
 
156
  - **Better phishing URL detection** via adversarial training.
157
  - **Deploying as a real-time browser extension.**
@@ -159,44 +175,3 @@ Dataset sourced from **Kaggle Malicious URLs Dataset**:πŸ“Œ [Dataset Link](https
159
  - **Expanding detection to identify zero-day threats.**
160
 
161
  ---
162
-
163
- ## Uploading to Hugging Face
164
-
165
- To upload this model to **Hugging Face**, follow these steps:
166
-
167
- ```bash
168
- pip install transformers huggingface_hub
169
- ```
170
-
171
- ```python
172
- from huggingface_hub import create_repo
173
- create_repo("your-huggingface-model-name")
174
- ```
175
-
176
- ```python
177
- from transformers import AutoModelForSequenceClassification, AutoTokenizer
178
- from huggingface_hub import HfApi
179
-
180
- model_name = "your-huggingface-model-name"
181
- model = AutoModelForSequenceClassification.from_pretrained("your-local-model-directory")
182
- tokenizer = AutoTokenizer.from_pretrained("your-local-model-directory")
183
-
184
- # Save & Push Model
185
- model.save_pretrained(f"{model_name}")
186
- tokenizer.save_pretrained(f"{model_name}")
187
-
188
- api = HfApi()
189
- api.upload_folder(
190
- folder_path=f"{model_name}",
191
- repo_id=f"your-huggingface-username/{model_name}",
192
- repo_type="model",
193
- )
194
- ```
195
-
196
- ---
197
-
198
- ## Conclusion
199
-
200
- The **Malicious URL Detection Model** provides **state-of-the-art** accuracy for detecting **phishing, malware, and defacement threats**. It is optimized for **real-time cybersecurity applications** and **deployed using Streamlit**.
201
-
202
- βœ… **Final F1-score: 0.965**βœ… **Optimized for real-time detection**βœ… **Ready for deployment via API & browser extension**
 
1
+ ---
2
+ language: en
3
+ tags:
4
+ - cybersecurity
5
+ - malicious-url-detection
6
+ - bert
7
+ - transformers
8
+ - phishing-detection
9
+ license: apache-2.0
10
+ ---
11
+
12
  # Malicious URL Detection Model
13
 
14
  > A fine-tuned **BERT-LoRA** model for detecting malicious URLs, including phishing, malware, and defacement threats.
 
19
 
20
  The model classifies URLs into **four categories**:
21
 
22
+ - **Benign**
23
+ - **Defacement**
24
+ - **Phishing**
25
+ - **Malware**
26
 
27
  It achieves **98% validation accuracy** and an **F1-score of 0.965**, ensuring robust detection capabilities.
28
 
 
32
 
33
  ### Use Cases
34
 
35
+ - Real-time URL classification for cybersecurity tools
36
+ - Phishing and malware detection for online safety
37
+ - Integration into browser extensions for instant threat alerts
38
+ - Security monitoring for SOC (Security Operations Centers)
39
 
40
  ### Limitations
41
 
42
+ - May **misclassify short or obfuscated URLs**
43
+ - Performance may degrade with **dynamic domain structures**
44
+ - Requires **frequent retraining** to adapt to evolving threats
45
 
46
  ---
47
 
 
120
 
121
  ## Deployment Options
122
 
123
+ ### Streamlit Web App
124
 
125
  - Deployed on **Streamlit Cloud, AWS, or Google Cloud**.
126
  - Provides **real-time URL analysis** with a user-friendly interface.
127
 
128
+ ### Browser Extension (Planned)
129
 
130
  - **Real-time scanning** of visited web pages.
131
  - **Dynamic threat alerts** with confidence scores.
132
 
133
+ ### API Integration
134
 
135
  - REST API for bulk URL analysis.
136
  - Supports **Security Operations Centers (SOC)**.
 
149
 
150
  ### Data Source
151
 
152
+ Dataset sourced from **Kaggle Malicious URLs Dataset**:
153
+ πŸ“Œ [Dataset Link](https://www.kaggle.com/datasets/sid321axn/malicious-urls-dataset)
154
 
155
  ### BibTeX Citation
156
 
157
  ```
158
  @article{maliciousurl2025,
159
+ author = {r3ddkahili},
160
  title = {Fine-Tuned BERT for Malicious URL Detection},
161
  year = {2025},
162
+ institution = {Western Sydney University}
 
163
  }
164
  ```
165
 
 
167
 
168
  ## Future Work
169
 
170
+ **Improvements Planned:**
171
 
172
  - **Better phishing URL detection** via adversarial training.
173
  - **Deploying as a real-time browser extension.**
 
175
  - **Expanding detection to identify zero-day threats.**
176
 
177
  ---