|
--- |
|
license: cc-by-4.0 |
|
tags: |
|
- multi-label-classification |
|
- text-classification |
|
- onnx |
|
- web-classification |
|
- firefox-ai |
|
- preview |
|
language: |
|
- multilingual |
|
datasets: |
|
- tshasan/multi-label-web-classification |
|
base_model: Alibaba-NLP/gte-modernbert-base |
|
pipeline_tag: text-classification |
|
library_name: transformers |
|
--- |
|
|
|
# URL-TITLE-classifier-preview |
|
|
|
## Model Overview |
|
|
|
This is a **preview version** of a multi-label web classification model fine-tuned from [`Alibaba-NLP/gte-modernbert-base`](https://huggingface.co/Alibaba-NLP/gte-modernbert-base). It classifies websites into multiple categories based on their URLs and titles. |
|
|
|
The model supports **11 labels**: |
|
`Uncategorized`, `News`, `Entertainment`, `Shop`, `Chat`, `Education`, `Government`, `Health`, `Technology`, `Work`, and `Travel`. |
|
|
|
- **Developed by**: Taimur Hasan |
|
- **Model Type**: Multi-label Text Classification |
|
- **Status**: Preview (under active development) |
|
|
|
### Architecture |
|
|
|
- **Fine-tuning Strategy**: Unfroze the last 4 encoder layers and the pooler |
|
- **Problem Type**: Multi-label classification |
|
- **Output Labels**: |
|
- `News`, `Entertainment`, `Shop`, `Chat`, `Education`, `Government`, `Health`, `Technology`, `Work`, `Travel`, `Uncategorized` |
|
- **Input Format**: Concatenated string: |
|
`"{url}:{title}"` |
|
|
|
--- |
|
|
|
## Evaluation Metrics (Validation Data) |
|
|
|
| Metric | Value | |
|
|-----------------------|--------| |
|
| **Loss** | 0.207 | |
|
| **Hamming Loss** | 0.083 | |
|
| **Exact Match** | 0.445 | |
|
| **Precision (Micro)** | 0.917 | |
|
| **Recall (Micro)** | 0.917 | |
|
| **F1 Score (Micro)** | 0.917 | |
|
| **Precision (Macro)** | 0.795 | |
|
| **Recall (Macro)** | 0.598 | |
|
| **F1 Score (Macro)** | 0.677 | |
|
| **Precision (Weighted)** | 0.798 | |
|
| **Recall (Weighted)** | 0.647 | |
|
| **F1 Score (Weighted)** | 0.711 | |
|
| **ROC AUC (Micro)** | 0.941 | |
|
| **ROC AUC (Macro)** | 0.928 | |
|
| **PR AUC (Micro)** | 0.815 | |
|
| **PR AUC (Macro)** | 0.765 | |
|
| **Jaccard (Micro)** | 0.848 | |
|
| **Jaccard (Macro)** | 0.520 | |
|
|
|
### Per-Label F1 Scores |
|
|
|
| Label | F1 Score | |
|
|----------------|----------| |
|
| News | 0.605 | |
|
| Entertainment | 0.764 | |
|
| Shop | 0.704 | |
|
| Chat | 0.875 | |
|
| Education | 0.763 | |
|
| Government | 0.667 | |
|
| Health | 0.574 | |
|
| Technology | 0.738 | |
|
| Work | 0.527 | |
|
| Travel | 0.571 | |
|
| Uncategorized | 0.657 | |
|
|
|
--- |
|
|
|
> **Note:** This model is in preview and may not generalize well outside of its training dataset. Feedback and contributions are welcome. |