metadata

title: README
emoji: 🏢
colorFrom: pink
colorTo: purple
sdk: static
pinned: false

Anvilogic - Where AI Meets Cybersecurity

Welcome to the official Hugging Face organization for Anvilogic's advanced cybersecurity AI models!
Founded in 2019, Anvilogic specializes in AI-driven threat detection and automation, enhancing Security Operations Center (SOC) capabilities with scalable, data-driven solutions.

Typosquatting collection

Typosquatting is a form of cyber attack where malicious actors create fake domain names that are visually or phonetically similar to legitimate domains, intending to deceive users into visiting these sites. This collection aims at detecting typosquatted domains by identifying and flagging such domains : It is comprised of the following:

Models

Embedder : This model provides representation for domain names. This is used to mine similar domain. This model exists both based on RoBerta model (with BPE tokenization) and CANINE-c (with character-level encoding)
Cross-Encoder : This model is able to compare two domain names and conclude if one model is a typosquat of another. This model exists both based on RoBerta model (with BPE tokenization) and CANINE-c (with character-level encoding)
T5 Detection : This model is a derived version of T5 trained on a new task. with the prefix : "Is the first domain a typosquat of the second : " to which we append typosquat candidate domain and Legitimate domain

Datasets

Embedder training dataset : Dataset formatted to train embedding model with (Anchor,Positive) pairs
Cross-Encoder : Dataset formatted to train Cross-encoder model with (Anchor,Positive,label) samples.
T5 Detection : Dataset formatted to train T5 model with (prompt,response) pairs .

Spaces

Multiple spaces are provided to try aforementioned models.