|
--- |
|
tags: |
|
- MALICIOUS |
|
- URL |
|
- DOMAIN |
|
- ANALYSIS |
|
- ML |
|
- IOT |
|
--- |
|
### URL Classification Dataset |
|
|
|
# About Dataset |
|
|
|
Malicious URLs or malicious websites pose a serious threat to cybersecurity. They host unsolicited content, such as spam, phishing, and drive-by downloads, luring unsuspecting users into scams that lead to monetary loss, theft of private information, and malware installation. These threats result in billions of dollars in losses annually. |
|
This dataset has been collected to include a large number of examples of malicious URLs. The goal is to develop a machine learning-based model that can proactively identify and block malicious URLs before they infect computer systems or spread across the internet. |
|
|
|
# Content |
|
|
|
This dataset consists of 651,191 URLs, categorized as follows: |
|
- 428,103 benign (safe) URLs |
|
- 96,457 defacement URLs |
|
- 94,111 phishing URLs |
|
- 32,520 malware URLs |
|
|
|
|
|
|
|
Curating a high-quality dataset is one of the most crucial steps in any machine learning project. This dataset has been carefully curated from five different sources to ensure diversity and reliability. |
|
|
|
# Warning: |
|
If the model is not accessible via Hugging Face, please download it from the repository:nishnk/url_classification_model.pkl |
|
|
|
license: mitlanguage: |
|
|
|
enmetrics: |
|
|
|
accuracy |
|
code_eval |