firefoxrecap
/

URL-TITLE-classifier

@@ -1,31 +1,83 @@
 ---
 license: cc-by-4.0
 tags:
-- multi-label-classification
-- text-classification
-- onnx
-- web-classification
-- firefox-ai
-- preview
 language:
-- multilingual
 datasets:
-- tshasan/multi-label-web-classification
 base_model: Alibaba-NLP/gte-modernbert-base
 pipeline_tag: text-classification
 ---
-# modernBERT-URLTITLE-classifier-preview
 ## Model Overview
-This is a **preview version** of a multi-label web classification model fine-tuned from `Alibaba-NLP/gte-modernbert-base`. It classifies websites into multiple categories based on their URLs and titles. The model supports 11 labels: `Uncatergorized`,`News`, `Entertainment`, `Shop`, `Chat`, `Education`, `Government`, `Health`, `Technology`, `Work`, and `Travel`.
-- **Developed by**: Taimur Hasan
-- **Model Type**: Multi-label Text Classification
-- **Status**: Preview (under active development
 ### Architecture
-- **Fine-tuning**: Unfroze the last 4 encoder layers and the pooler
 - **Problem Type**: Multi-label classification
-- **Output Labels**: 11 (`News`, `Entertainment`, `Shop`, `Chat`, `Education`, `Government`, `Health`, `Technology`, `Work`, `Travel`,`Uncatergorized`)
-- **Input Format**: Concatenated string: `"{url}:{title}"`

 ---
 license: cc-by-4.0
 tags:
+  - multi-label-classification
+  - text-classification
+  - onnx
+  - web-classification
+  - firefox-ai
+  - preview
 language:
+  - multilingual
 datasets:
+  - tshasan/multi-label-web-classification
 base_model: Alibaba-NLP/gte-modernbert-base
 pipeline_tag: text-classification
 ---
+# URL-TITLE-classifier-preview
 ## Model Overview
+This is a **preview version** of a multi-label web classification model fine-tuned from [`Alibaba-NLP/gte-modernbert-base`](https://huggingface.co/Alibaba-NLP/gte-modernbert-base). It classifies websites into multiple categories based on their URLs and titles.
+The model supports **11 labels**:
+`Uncategorized`, `News`, `Entertainment`, `Shop`, `Chat`, `Education`, `Government`, `Health`, `Technology`, `Work`, and `Travel`.
+- **Developed by**: Taimur Hasan
+- **Model Type**: Multi-label Text Classification
+- **Status**: Preview (under active development)
 ### Architecture
+- **Fine-tuning Strategy**: Unfroze the last 4 encoder layers and the pooler
 - **Problem Type**: Multi-label classification
+- **Output Labels**:
+  - `News`, `Entertainment`, `Shop`, `Chat`, `Education`, `Government`, `Health`, `Technology`, `Work`, `Travel`, `Uncategorized`
+- **Input Format**: Concatenated string:
+  `"{url}:{title}"`
+---
+## Evaluation Metrics (Validation Data)
+| Metric                | Value  |
+|-----------------------|--------|
+| **Loss**              | 0.207  |
+| **Hamming Loss**      | 0.083  |
+| **Exact Match**       | 0.445  |
+| **Precision (Micro)** | 0.917  |
+| **Recall (Micro)**    | 0.917  |
+| **F1 Score (Micro)**  | 0.917  |
+| **Precision (Macro)** | 0.795  |
+| **Recall (Macro)**    | 0.598  |
+| **F1 Score (Macro)**  | 0.677  |
+| **Precision (Weighted)** | 0.798 |
+| **Recall (Weighted)**    | 0.647 |
+| **F1 Score (Weighted)**  | 0.711 |
+| **ROC AUC (Micro)**      | 0.941 |
+| **ROC AUC (Macro)**      | 0.928 |
+| **PR AUC (Micro)**       | 0.815 |
+| **PR AUC (Macro)**       | 0.765 |
+| **Jaccard (Micro)**      | 0.848 |
+| **Jaccard (Macro)**      | 0.520 |
+### Per-Label F1 Scores
+| Label           | F1 Score |
+|----------------|----------|
+| News           | 0.605    |
+| Entertainment  | 0.764    |
+| Shop           | 0.704    |
+| Chat           | 0.875    |
+| Education      | 0.763    |
+| Government     | 0.667    |
+| Health         | 0.574    |
+| Technology     | 0.738    |
+| Work           | 0.527    |
+| Travel         | 0.571    |
+| Uncategorized  | 0.657    |
+---
+> **Note:** This model is in preview and may not generalize well outside of its training dataset. Feedback and contributions are welcome.