File size: 2,619 Bytes
5093161
 
 
9cf2643
 
 
 
 
 
5093161
9cf2643
5093161
9cf2643
5093161
 
5273865
5093161
 
d86f700
5093161
 
 
d86f700
 
 
 
 
 
 
 
5093161
 
d86f700
 
5093161
d86f700
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9cf2643
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
---
license: cc-by-4.0
tags:
- multi-label-classification
- text-classification
- onnx
- web-classification
- firefox-ai
- preview
language:
- multilingual
datasets:
- tshasan/multi-label-web-classification
base_model: Alibaba-NLP/gte-modernbert-base
pipeline_tag: text-classification
library_name: transformers
---

# URL-TITLE-classifier-preview

## Model Overview

This is a **preview version** of a multi-label web classification model fine-tuned from [`Alibaba-NLP/gte-modernbert-base`](https://huggingface.co/Alibaba-NLP/gte-modernbert-base). It classifies websites into multiple categories based on their URLs and titles. 

The model supports **11 labels**:  
`Uncategorized`, `News`, `Entertainment`, `Shop`, `Chat`, `Education`, `Government`, `Health`, `Technology`, `Work`, and `Travel`.

- **Developed by**: Taimur Hasan  
- **Model Type**: Multi-label Text Classification  
- **Status**: Preview (under active development)

### Architecture

- **Fine-tuning Strategy**: Unfroze the last 4 encoder layers and the pooler
- **Problem Type**: Multi-label classification
- **Output Labels**:  
  - `News`, `Entertainment`, `Shop`, `Chat`, `Education`, `Government`, `Health`, `Technology`, `Work`, `Travel`, `Uncategorized`
- **Input Format**: Concatenated string:  
  `"{url}:{title}"`

---

## Evaluation Metrics (Validation Data)

| Metric                | Value  |
|-----------------------|--------|
| **Loss**              | 0.207  |
| **Hamming Loss**      | 0.083  |
| **Exact Match**       | 0.445  |
| **Precision (Micro)** | 0.917  |
| **Recall (Micro)**    | 0.917  |
| **F1 Score (Micro)**  | 0.917  |
| **Precision (Macro)** | 0.795  |
| **Recall (Macro)**    | 0.598  |
| **F1 Score (Macro)**  | 0.677  |
| **Precision (Weighted)** | 0.798 |
| **Recall (Weighted)**    | 0.647 |
| **F1 Score (Weighted)**  | 0.711 |
| **ROC AUC (Micro)**      | 0.941 |
| **ROC AUC (Macro)**      | 0.928 |
| **PR AUC (Micro)**       | 0.815 |
| **PR AUC (Macro)**       | 0.765 |
| **Jaccard (Micro)**      | 0.848 |
| **Jaccard (Macro)**      | 0.520 |

### Per-Label F1 Scores

| Label           | F1 Score |
|----------------|----------|
| News           | 0.605    |
| Entertainment  | 0.764    |
| Shop           | 0.704    |
| Chat           | 0.875    |
| Education      | 0.763    |
| Government     | 0.667    |
| Health         | 0.574    |
| Technology     | 0.738    |
| Work           | 0.527    |
| Travel         | 0.571    |
| Uncategorized  | 0.657    |

---

> **Note:** This model is in preview and may not generalize well outside of its training dataset. Feedback and contributions are welcome.