--- license: cc-by-4.0 tags: - multi-label-classification - text-classification - onnx - web-classification - firefox-ai - preview language: - multilingual datasets: - tshasan/multi-label-web-classification base_model: Alibaba-NLP/gte-modernbert-base pipeline_tag: text-classification library_name: transformers --- # URL-TITLE-classifier-preview ## Model Overview This is a **preview version** of a multi-label web classification model fine-tuned from [`Alibaba-NLP/gte-modernbert-base`](https://huggingface.co/Alibaba-NLP/gte-modernbert-base). It classifies websites into multiple categories based on their URLs and titles. The model supports **11 labels**: `Uncategorized`, `News`, `Entertainment`, `Shop`, `Chat`, `Education`, `Government`, `Health`, `Technology`, `Work`, and `Travel`. - **Developed by**: Taimur Hasan - **Model Type**: Multi-label Text Classification - **Status**: Preview (under active development) ### Architecture - **Fine-tuning Strategy**: Unfroze the last 4 encoder layers and the pooler - **Problem Type**: Multi-label classification - **Output Labels**: - `News`, `Entertainment`, `Shop`, `Chat`, `Education`, `Government`, `Health`, `Technology`, `Work`, `Travel`, `Uncategorized` - **Input Format**: Concatenated string: `"{url}:{title}"` --- ## Evaluation Metrics (Validation Data) | Metric | Value | |-----------------------|--------| | **Loss** | 0.207 | | **Hamming Loss** | 0.083 | | **Exact Match** | 0.445 | | **Precision (Micro)** | 0.917 | | **Recall (Micro)** | 0.917 | | **F1 Score (Micro)** | 0.917 | | **Precision (Macro)** | 0.795 | | **Recall (Macro)** | 0.598 | | **F1 Score (Macro)** | 0.677 | | **Precision (Weighted)** | 0.798 | | **Recall (Weighted)** | 0.647 | | **F1 Score (Weighted)** | 0.711 | | **ROC AUC (Micro)** | 0.941 | | **ROC AUC (Macro)** | 0.928 | | **PR AUC (Micro)** | 0.815 | | **PR AUC (Macro)** | 0.765 | | **Jaccard (Micro)** | 0.848 | | **Jaccard (Macro)** | 0.520 | ### Per-Label F1 Scores | Label | F1 Score | |----------------|----------| | News | 0.605 | | Entertainment | 0.764 | | Shop | 0.704 | | Chat | 0.875 | | Education | 0.763 | | Government | 0.667 | | Health | 0.574 | | Technology | 0.738 | | Work | 0.527 | | Travel | 0.571 | | Uncategorized | 0.657 | --- > **Note:** This model is in preview and may not generalize well outside of its training dataset. Feedback and contributions are welcome.