File size: 2,619 Bytes
5093161 9cf2643 5093161 9cf2643 5093161 9cf2643 5093161 5273865 5093161 d86f700 5093161 d86f700 5093161 d86f700 5093161 d86f700 9cf2643 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 |
---
license: cc-by-4.0
tags:
- multi-label-classification
- text-classification
- onnx
- web-classification
- firefox-ai
- preview
language:
- multilingual
datasets:
- tshasan/multi-label-web-classification
base_model: Alibaba-NLP/gte-modernbert-base
pipeline_tag: text-classification
library_name: transformers
---
# URL-TITLE-classifier-preview
## Model Overview
This is a **preview version** of a multi-label web classification model fine-tuned from [`Alibaba-NLP/gte-modernbert-base`](https://huggingface.co/Alibaba-NLP/gte-modernbert-base). It classifies websites into multiple categories based on their URLs and titles.
The model supports **11 labels**:
`Uncategorized`, `News`, `Entertainment`, `Shop`, `Chat`, `Education`, `Government`, `Health`, `Technology`, `Work`, and `Travel`.
- **Developed by**: Taimur Hasan
- **Model Type**: Multi-label Text Classification
- **Status**: Preview (under active development)
### Architecture
- **Fine-tuning Strategy**: Unfroze the last 4 encoder layers and the pooler
- **Problem Type**: Multi-label classification
- **Output Labels**:
- `News`, `Entertainment`, `Shop`, `Chat`, `Education`, `Government`, `Health`, `Technology`, `Work`, `Travel`, `Uncategorized`
- **Input Format**: Concatenated string:
`"{url}:{title}"`
---
## Evaluation Metrics (Validation Data)
| Metric | Value |
|-----------------------|--------|
| **Loss** | 0.207 |
| **Hamming Loss** | 0.083 |
| **Exact Match** | 0.445 |
| **Precision (Micro)** | 0.917 |
| **Recall (Micro)** | 0.917 |
| **F1 Score (Micro)** | 0.917 |
| **Precision (Macro)** | 0.795 |
| **Recall (Macro)** | 0.598 |
| **F1 Score (Macro)** | 0.677 |
| **Precision (Weighted)** | 0.798 |
| **Recall (Weighted)** | 0.647 |
| **F1 Score (Weighted)** | 0.711 |
| **ROC AUC (Micro)** | 0.941 |
| **ROC AUC (Macro)** | 0.928 |
| **PR AUC (Micro)** | 0.815 |
| **PR AUC (Macro)** | 0.765 |
| **Jaccard (Micro)** | 0.848 |
| **Jaccard (Macro)** | 0.520 |
### Per-Label F1 Scores
| Label | F1 Score |
|----------------|----------|
| News | 0.605 |
| Entertainment | 0.764 |
| Shop | 0.704 |
| Chat | 0.875 |
| Education | 0.763 |
| Government | 0.667 |
| Health | 0.574 |
| Technology | 0.738 |
| Work | 0.527 |
| Travel | 0.571 |
| Uncategorized | 0.657 |
---
> **Note:** This model is in preview and may not generalize well outside of its training dataset. Feedback and contributions are welcome. |