MyPoliBERT-ver03
Model Overview
MyPoliBERT-ver03 is a fine-tuned version of (bert-base-uncased) on multiple datasets, designed for multi-label classification of political topics including Democracy, Economy, Race, Leadership, Development, Corruption, Instability, Safety, Administration, Education, Religion, and Environment. This version is an update of the original YagiASAFAS/MyPoliBERT model and explicitly improves the classification performance for the Leadership topic.
Intended Uses and Limitations
Intended Uses
This model is intended for analyzing political texts and identifying multiple political topics, with a special focus on accurately classifying leadership-related content. It can be applied to various text sources such as news articles and social media posts.Limitations
- The model is fine-tuned on an unknown dataset, and details regarding the data sources are limited; therefore, its performance may vary on texts from different domains or regions.
- As with most deep learning models, the internal decision process is not inherently interpretable; human review is recommended for critical applications.
- The model may not reflect recent political developments due to the static nature of its training data.
Dataset
The training and evaluation data consist of 29226 records, with an 80% training split and 20% validation split.
Data Sources include:
- tnwei/ms-newspapers dataset
- Malaysian political posts from Reddit
- Malaysian political posts from Instagram
- Malaysian political posts from Facebook
Additionally, to address biases in topics and sentiment observed in news as well as social media posts and comments, a portion of the data was artificially generated using Generative AI-aided Data Augmentation.
Model Architecture
- Base Model: (bert-base-uncased)
- Task: Multi-label classification for 12 political topics
- Output: The model outputs classification scores for each topic; in this updated version the Leadership classification has been notably improved.
Training Procedure
Hyperparameters
- learning_rate: 3e-05
- train_batch_size: 16
- eval_batch_size: 16
- seed: 42
- gradient_accumulation_steps: 2
- total_train_batch_size: 32
- optimizer: ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- num_epochs: 16
- mixed_precision_training: Native AMP
Training Configuration
The training followed a standard procedure with periodic evaluation; the best checkpoint (obtained at epoch 7) was selected based on overall performance metrics.
Evaluation and Performance
The model achieves the following results on the evaluation set:
- Loss: 0.2655
- Democracy F1: 0.9312
- Democracy Accuracy: 0.9318
- Economy F1: 0.9143
- Economy Accuracy: 0.9151
- Race F1: 0.9449
- Race Accuracy: 0.9456
- Leadership F1: 0.8488
- Leadership Accuracy: 0.8494
- Development F1: 0.8710
- Development Accuracy: 0.8748
- Corruption F1: 0.9420
- Corruption Accuracy: 0.9441
- Instability F1: 0.9164
- Instability Accuracy: 0.9198
- Safety F1: 0.9042
- Safety Accuracy: 0.9032
- Administration F1: 0.8831
- Administration Accuracy: 0.8891
- Education F1: 0.9565
- Education Accuracy: 0.9567
- Religion F1: 0.9426
- Religion Accuracy: 0.9424
- Environment F1: 0.9745
- Environment Accuracy: 0.9746
- Overall F1: 0.9191
- Overall Accuracy: 0.9206
These results demonstrate robust performance across most topics, with a particular improvement in the Leadership category compared to the original model.
Training Results
Training Loss | Epoch | Step | Validation Loss | Democracy F1 | Democracy Accuracy | Economy F1 | Economy Accuracy | Race F1 | Race Accuracy | Leadership F1 | Leadership Accuracy | Development F1 | Development Accuracy | Corruption F1 | Corruption Accuracy | Instability F1 | Instability Accuracy | Safety F1 | Safety Accuracy | Administration F1 | Administration Accuracy | Education F1 | Education Accuracy | Religion F1 | Religion Accuracy | Environment F1 | Environment Accuracy | Overall F1 | Overall Accuracy |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0.448 | 1.0 | 674 | 0.2781 | 0.8973 | 0.9201 | 0.8952 | 0.9062 | 0.9346 | 0.9385 | 0.8199 | 0.8340 | 0.8462 | 0.8672 | 0.9210 | 0.9302 | 0.8873 | 0.9084 | 0.8869 | 0.8947 | 0.8307 | 0.8700 | 0.9344 | 0.9467 | 0.9219 | 0.9304 | 0.9565 | 0.9619 | 0.8943 | 0.9090 |
0.2646 | 2.0 | 1348 | 0.2372 | 0.9232 | 0.9335 | 0.9111 | 0.9144 | 0.9438 | 0.9467 | 0.8406 | 0.8403 | 0.8669 | 0.8739 | 0.9385 | 0.9424 | 0.9222 | 0.9278 | 0.9038 | 0.9081 | 0.8724 | 0.8869 | 0.9543 | 0.9580 | 0.9380 | 0.9409 | 0.9732 | 0.9734 | 0.9157 | 0.9205 |
0.1696 | 3.0 | 2022 | 0.2291 | 0.9277 | 0.9333 | 0.9132 | 0.9177 | 0.9441 | 0.9469 | 0.8465 | 0.8503 | 0.8768 | 0.8847 | 0.9423 | 0.9454 | 0.9219 | 0.9255 | 0.9104 | 0.9114 | 0.8806 | 0.8919 | 0.9592 | 0.9597 | 0.9407 | 0.9419 | 0.9753 | 0.9766 | 0.9199 | 0.9238 |
0.1309 | 4.0 | 2696 | 0.2374 | 0.9290 | 0.9344 | 0.9168 | 0.9175 | 0.9441 | 0.9452 | 0.8454 | 0.8470 | 0.8733 | 0.8804 | 0.9433 | 0.9465 | 0.9215 | 0.9233 | 0.9101 | 0.9096 | 0.8762 | 0.8758 | 0.9577 | 0.9597 | 0.9389 | 0.9408 | 0.9740 | 0.9740 | 0.9192 | 0.9212 |
0.1085 | 5.0 | 3370 | 0.2414 | 0.9314 | 0.9346 | 0.9166 | 0.9175 | 0.9419 | 0.9452 | 0.8492 | 0.8459 | 0.8747 | 0.8808 | 0.9435 | 0.9463 | 0.9218 | 0.9257 | 0.9070 | 0.9083 | 0.8862 | 0.8921 | 0.9574 | 0.9588 | 0.9420 | 0.9426 | 0.9732 | 0.9736 | 0.9204 | 0.9226 |
0.0759 | 6.0 | 4044 | 0.2556 | 0.9311 | 0.9313 | 0.9153 | 0.9162 | 0.9465 | 0.9473 | 0.8492 | 0.8511 | 0.8743 | 0.8810 | 0.9431 | 0.9447 | 0.9185 | 0.9205 | 0.9049 | 0.9034 | 0.8797 | 0.8886 | 0.9588 | 0.9601 | 0.9419 | 0.9421 | 0.9753 | 0.9757 | 0.9199 | 0.9218 |
0.0618 | 7.0 | 4718 | 0.2655 | 0.9312 | 0.9318 | 0.9143 | 0.9151 | 0.9449 | 0.9456 | 0.8488 | 0.8494 | 0.8710 | 0.8748 | 0.9420 | 0.9441 | 0.9164 | 0.9198 | 0.9042 | 0.9032 | 0.8831 | 0.8891 | 0.9565 | 0.9567 | 0.9426 | 0.9424 | 0.9745 | 0.9746 | 0.9191 | 0.9206 |
Future Improvements
- Incorporate additional data and domain adaptation techniques to further improve performance across all topics.
- Enhance model interpretability using explainability methods.
- Monitor and update the model periodically to capture evolving political trends.
License and Usage Notes
- The predictions of this model should be used as a reference and interpreted within the context of the training data limitations.
- Users are encouraged to validate model outputs with human review for critical applications.
- Regular updates and retraining are recommended to maintain relevance and accuracy.
Framework Versions
- Transformers: 4.48.2
- Pytorch: 2.5.1+cu124
- Datasets: 3.2.0
- Tokenizers: 0.21.0
- Downloads last month
- 38
Model tree for YagiASAFAS/MyPoliBERT-ver03
Base model
google-bert/bert-base-uncased