URL-TITLE-classifier-preview

Model Overview

This is a preview version of a multi-label web classification model fine-tuned from Alibaba-NLP/gte-modernbert-base. It classifies websites into multiple categories based on their URLs and titles.

The model supports 11 labels:
Uncategorized, News, Entertainment, Shop, Chat, Education, Government, Health, Technology, Work, and Travel.

  • Developed by: Taimur Hasan
  • Model Type: Multi-label Text Classification
  • Status: Preview (under active development)

Architecture

  • Fine-tuning Strategy: Unfroze the last 4 encoder layers and the pooler
  • Problem Type: Multi-label classification
  • Output Labels:
    • News, Entertainment, Shop, Chat, Education, Government, Health, Technology, Work, Travel, Uncategorized
  • Input Format: Concatenated string:
    "{url}:{title}"

Evaluation Metrics (Validation Data)

Metric Value
Loss 0.207
Hamming Loss 0.083
Exact Match 0.445
Precision (Micro) 0.917
Recall (Micro) 0.917
F1 Score (Micro) 0.917
Precision (Macro) 0.795
Recall (Macro) 0.598
F1 Score (Macro) 0.677
Precision (Weighted) 0.798
Recall (Weighted) 0.647
F1 Score (Weighted) 0.711
ROC AUC (Micro) 0.941
ROC AUC (Macro) 0.928
PR AUC (Micro) 0.815
PR AUC (Macro) 0.765
Jaccard (Micro) 0.848
Jaccard (Macro) 0.520

Per-Label F1 Scores

Label F1 Score
News 0.605
Entertainment 0.764
Shop 0.704
Chat 0.875
Education 0.763
Government 0.667
Health 0.574
Technology 0.738
Work 0.527
Travel 0.571
Uncategorized 0.657

Note: This model is in preview and may not generalize well outside of its training dataset. Feedback and contributions are welcome.

Downloads last month
130
Safetensors
Model size
150M params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ 1 Ask for provider support

Model tree for firefoxrecap/URL-TITLE-classifier

Quantized
(1)
this model