MyPoliBERT-ver03

Model Overview

MyPoliBERT-ver03 is a fine-tuned version of (bert-base-uncased) on multiple datasets, designed for multi-label classification of political topics including Democracy, Economy, Race, Leadership, Development, Corruption, Instability, Safety, Administration, Education, Religion, and Environment. This version is an update of the original YagiASAFAS/MyPoliBERT model and explicitly improves the classification performance for the Leadership topic.

Intended Uses and Limitations

  • Intended Uses
    This model is intended for analyzing political texts and identifying multiple political topics, with a special focus on accurately classifying leadership-related content. It can be applied to various text sources such as news articles and social media posts.

  • Limitations

    1. The model is fine-tuned on an unknown dataset, and details regarding the data sources are limited; therefore, its performance may vary on texts from different domains or regions.
    2. As with most deep learning models, the internal decision process is not inherently interpretable; human review is recommended for critical applications.
    3. The model may not reflect recent political developments due to the static nature of its training data.

Dataset

The training and evaluation data consist of 29226 records, with an 80% training split and 20% validation split.
Data Sources include:

  • tnwei/ms-newspapers dataset
  • Malaysian political posts from Reddit
  • Malaysian political posts from Instagram
  • Malaysian political posts from Facebook

Additionally, to address biases in topics and sentiment observed in news as well as social media posts and comments, a portion of the data was artificially generated using Generative AI-aided Data Augmentation.

Model Architecture

  • Base Model: (bert-base-uncased)
  • Task: Multi-label classification for 12 political topics
  • Output: The model outputs classification scores for each topic; in this updated version the Leadership classification has been notably improved.

Training Procedure

  • Hyperparameters

    • learning_rate: 3e-05
    • train_batch_size: 16
    • eval_batch_size: 16
    • seed: 42
    • gradient_accumulation_steps: 2
    • total_train_batch_size: 32
    • optimizer: ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08
    • lr_scheduler_type: linear
    • num_epochs: 16
    • mixed_precision_training: Native AMP
  • Training Configuration
    The training followed a standard procedure with periodic evaluation; the best checkpoint (obtained at epoch 7) was selected based on overall performance metrics.

Evaluation and Performance

The model achieves the following results on the evaluation set:

  • Loss: 0.2655
  • Democracy F1: 0.9312
  • Democracy Accuracy: 0.9318
  • Economy F1: 0.9143
  • Economy Accuracy: 0.9151
  • Race F1: 0.9449
  • Race Accuracy: 0.9456
  • Leadership F1: 0.8488
  • Leadership Accuracy: 0.8494
  • Development F1: 0.8710
  • Development Accuracy: 0.8748
  • Corruption F1: 0.9420
  • Corruption Accuracy: 0.9441
  • Instability F1: 0.9164
  • Instability Accuracy: 0.9198
  • Safety F1: 0.9042
  • Safety Accuracy: 0.9032
  • Administration F1: 0.8831
  • Administration Accuracy: 0.8891
  • Education F1: 0.9565
  • Education Accuracy: 0.9567
  • Religion F1: 0.9426
  • Religion Accuracy: 0.9424
  • Environment F1: 0.9745
  • Environment Accuracy: 0.9746
  • Overall F1: 0.9191
  • Overall Accuracy: 0.9206

These results demonstrate robust performance across most topics, with a particular improvement in the Leadership category compared to the original model.

Training Results

Training Loss Epoch Step Validation Loss Democracy F1 Democracy Accuracy Economy F1 Economy Accuracy Race F1 Race Accuracy Leadership F1 Leadership Accuracy Development F1 Development Accuracy Corruption F1 Corruption Accuracy Instability F1 Instability Accuracy Safety F1 Safety Accuracy Administration F1 Administration Accuracy Education F1 Education Accuracy Religion F1 Religion Accuracy Environment F1 Environment Accuracy Overall F1 Overall Accuracy
0.448 1.0 674 0.2781 0.8973 0.9201 0.8952 0.9062 0.9346 0.9385 0.8199 0.8340 0.8462 0.8672 0.9210 0.9302 0.8873 0.9084 0.8869 0.8947 0.8307 0.8700 0.9344 0.9467 0.9219 0.9304 0.9565 0.9619 0.8943 0.9090
0.2646 2.0 1348 0.2372 0.9232 0.9335 0.9111 0.9144 0.9438 0.9467 0.8406 0.8403 0.8669 0.8739 0.9385 0.9424 0.9222 0.9278 0.9038 0.9081 0.8724 0.8869 0.9543 0.9580 0.9380 0.9409 0.9732 0.9734 0.9157 0.9205
0.1696 3.0 2022 0.2291 0.9277 0.9333 0.9132 0.9177 0.9441 0.9469 0.8465 0.8503 0.8768 0.8847 0.9423 0.9454 0.9219 0.9255 0.9104 0.9114 0.8806 0.8919 0.9592 0.9597 0.9407 0.9419 0.9753 0.9766 0.9199 0.9238
0.1309 4.0 2696 0.2374 0.9290 0.9344 0.9168 0.9175 0.9441 0.9452 0.8454 0.8470 0.8733 0.8804 0.9433 0.9465 0.9215 0.9233 0.9101 0.9096 0.8762 0.8758 0.9577 0.9597 0.9389 0.9408 0.9740 0.9740 0.9192 0.9212
0.1085 5.0 3370 0.2414 0.9314 0.9346 0.9166 0.9175 0.9419 0.9452 0.8492 0.8459 0.8747 0.8808 0.9435 0.9463 0.9218 0.9257 0.9070 0.9083 0.8862 0.8921 0.9574 0.9588 0.9420 0.9426 0.9732 0.9736 0.9204 0.9226
0.0759 6.0 4044 0.2556 0.9311 0.9313 0.9153 0.9162 0.9465 0.9473 0.8492 0.8511 0.8743 0.8810 0.9431 0.9447 0.9185 0.9205 0.9049 0.9034 0.8797 0.8886 0.9588 0.9601 0.9419 0.9421 0.9753 0.9757 0.9199 0.9218
0.0618 7.0 4718 0.2655 0.9312 0.9318 0.9143 0.9151 0.9449 0.9456 0.8488 0.8494 0.8710 0.8748 0.9420 0.9441 0.9164 0.9198 0.9042 0.9032 0.8831 0.8891 0.9565 0.9567 0.9426 0.9424 0.9745 0.9746 0.9191 0.9206

Future Improvements

  • Incorporate additional data and domain adaptation techniques to further improve performance across all topics.
  • Enhance model interpretability using explainability methods.
  • Monitor and update the model periodically to capture evolving political trends.

License and Usage Notes

  • The predictions of this model should be used as a reference and interpreted within the context of the training data limitations.
  • Users are encouraged to validate model outputs with human review for critical applications.
  • Regular updates and retraining are recommended to maintain relevance and accuracy.

Framework Versions

  • Transformers: 4.48.2
  • Pytorch: 2.5.1+cu124
  • Datasets: 3.2.0
  • Tokenizers: 0.21.0
Downloads last month
38
Safetensors
Model size
110M params
Tensor type
F32
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.

Model tree for YagiASAFAS/MyPoliBERT-ver03

Finetuned
(3348)
this model

Dataset used to train YagiASAFAS/MyPoliBERT-ver03