|
--- |
|
title: README |
|
emoji: 🏢 |
|
colorFrom: pink |
|
colorTo: purple |
|
sdk: static |
|
pinned: false |
|
--- |
|
|
|
# **Anvilogic - Where AI Meets Cybersecurity** |
|
|
|
Welcome to the official Hugging Face organization for Anvilogic's advanced cybersecurity AI models! |
|
Founded in 2019, Anvilogic specializes in AI-driven threat detection and automation, enhancing Security Operations Center (SOC) capabilities with scalable, data-driven solutions. |
|
|
|
## Typosquatting collection |
|
Typosquatting is a form of cyber attack where malicious actors create fake domain names that are visually or phonetically similar to legitimate domains, intending to deceive users into visiting these sites. |
|
This collection aims at detecting typosquatted domains by identifying and flagging such domains : |
|
It is comprised of the following: |
|
|
|
### Models |
|
|
|
- **Embedder :** This model provides representation for domain names. This is used to mine similar domain. This model exists both based on RoBerta model (with BPE tokenization) and CANINE-c (with character-level encoding) |
|
- **Cross-Encoder :** This model is able to compare two domain names and conclude if one model is a typosquat of another. This model exists both based on RoBerta model (with BPE tokenization) and CANINE-c (with character-level encoding) |
|
- **T5 Detection :** This model is a derived version of T5 trained on a new task. with the prefix : "Is the first domain a typosquat of the second : " to which we append *typosquat candidate domain* and *Legitimate domain* |
|
|
|
### Datasets |
|
|
|
- **Embedder training dataset :** Dataset formatted to train embedding model with (Anchor,Positive) pairs |
|
- **Cross-Encoder :** Dataset formatted to train Cross-encoder model with (Anchor,Positive,label) samples. |
|
- **T5 Detection :** Dataset formatted to train T5 model with (prompt,response) pairs . |
|
|
|
### Spaces |
|
Multiple spaces are provided to try aforementioned models. |
|
|