tags:
- MALICIOUS
- URL
- DOMAIN
- ANALYSIS
- ML
- IOT
URL Classification Dataset
About Dataset
Malicious URLs or malicious websites pose a serious threat to cybersecurity. They host unsolicited content, such as spam, phishing, and drive-by downloads, luring unsuspecting users into scams that lead to monetary loss, theft of private information, and malware installation. These threats result in billions of dollars in losses annually. This dataset has been collected to include a large number of examples of malicious URLs. The goal is to develop a machine learning-based model that can proactively identify and block malicious URLs before they infect computer systems or spread across the internet.
Content
This dataset consists of 651,191 URLs, categorized as follows:
- 428,103 benign (safe) URLs
- 96,457 defacement URLs
- 94,111 phishing URLs
- 32,520 malware URLs
Curating a high-quality dataset is one of the most crucial steps in any machine learning project. This dataset has been carefully curated from five different sources to ensure diversity and reliability.
Warning:
If the model is not accessible via Hugging Face, please download it from the repository:nishnk/url_classification_model.pkl
license: mitlanguage:
enmetrics:
accuracy code_eval