uboza10300's picture
Update README.md
0761804 verified
---
language: en
tags:
- emotion-classification
- text-classification
- distilbert
datasets:
- dair-ai/emotion
metrics:
- accuracy
---
# Emotion Classification Model
## Model Description
This model fine-tunes **DistilBERT** for the task of **emotion classification**. It is trained to classify text into one of six emotions: **sadness, joy, love, anger, fear, and surprise**. The model is designed for natural language processing applications where understanding emotions in text is valuable, such as social media analysis, customer feedback, and mental health monitoring.
## Training and Evaluation
- **Training Dataset:** [dair-ai/emotion](https://huggingface.co/datasets/dair-ai/emotion) (16,000 examples)
- **Validation Accuracy:** 94.5%
- **Test Accuracy:** 93.1%
- **Training Time:** 169.2 seconds (~2 minutes 49 seconds)
- **Hyperparameters:**
- Learning Rate: 5e-5
- Batch Size (Train): 32
- Batch Size (Validation): 64
- Epochs: 3
- Weight Decay: 0.01
- Optimizer: AdamW
- Evaluation Strategy: Epoch-based
## Usage
```python
from transformers import pipeline
# Load the model from HuggingFace Hub
classifier = pipeline("text-classification", model="your-username/emotion-classification-model")
# Example usage
text = "I’m so happy today!"
result = classifier(text)
print(result)
## Limitations
**Biases in Dataset**
The model was trained on the dair-ai/emotion dataset, which may not represent the full diversity of language use across demographics, regions, or cultures.
As a result, it might underperform on texts containing:
- Slang or Informal Language
For example, "I'm shook!" may not be accurately classified as an expression of surprise.
- Non-Standard Grammar or Dialects
Variants like African American Vernacular English (AAVE) or regional dialects might lead to misclassifications.
- Limited Contextual Understanding
The model processes inputs as isolated pieces of text, without awareness of surrounding context.
For instance:
- Sarcasm
"Oh great, another rainy day!" may not be correctly classified as expressing frustration.
- Complex or Mixed Emotions
Texts expressing multiple emotions (e.g., "I’m angry but also relieved") may be oversimplified into a single label.
- Short Texts and Ambiguity
Performance can degrade for very short texts (e.g., one or two words) due to insufficient context.
For example:
- "Wow!" might be classified as joy or surprise depending on subtle cues not present in such brief inputs.
- Ambiguous inputs like "Okay" or "Fine" are challenging without additional context.
- Domain-Specific Language
The model may underperform on text from specialized domains (e.g., legal, medical, or technical writing) or content involving code-mixed or multilingual inputs.
For example, "Estoy feliz!" might not be recognized as expressing joy without multilingual training.
## Potential Improvements
- Data Augmentation
Including additional datasets or generating synthetic data can improve generalization.
- Longer Training
Training for more epochs could marginally increase accuracy, although diminishing returns are likely.
- Larger Models
Fine-tuning larger models like BERT or RoBERTa may yield better results for nuanced understanding.
- Bias Mitigation
Incorporating fairness-aware training methods or balanced datasets could reduce biases.