metadata
title: README
emoji: 🏢
colorFrom: pink
colorTo: purple
sdk: static
pinned: false
Anvilogic - Where AI Meets Cybersecurity
Welcome to the official Hugging Face organization for Anvilogic's advanced cybersecurity AI models!
Founded in 2019, Anvilogic specializes in AI-driven threat detection and automation, enhancing Security Operations Center (SOC) capabilities with scalable, data-driven solutions.
Typosquatting collection
Typosquatting is a form of cyber attack where malicious actors create fake domain names that are visually or phonetically similar to legitimate domains, intending to deceive users into visiting these sites. This collection aims at detecting typosquatted domains by identifying and flagging such domains : It is comprised of the following:
Models
- Embedder : This model provides representation for domain names. This is used to mine similar domain. This model exists both based on RoBerta model (with BPE tokenization) and CANINE-c (with character-level encoding)
- Cross-Encoder : This model is able to compare two domain names and conclude if one model is a typosquat of another. This model exists both based on RoBerta model (with BPE tokenization) and CANINE-c (with character-level encoding)
- T5 Detection : This model is a derived version of T5 trained on a new task. with the prefix : "Is the first domain a typosquat of the second : " to which we append typosquat candidate domain and Legitimate domain
Datasets
- Embedder training dataset : Dataset formatted to train embedding model with (Anchor,Positive) pairs
- Cross-Encoder : Dataset formatted to train Cross-encoder model with (Anchor,Positive,label) samples.
- T5 Detection : Dataset formatted to train T5 model with (prompt,response) pairs .
Spaces
Multiple spaces are provided to try aforementioned models.