AirrStorm's picture
Update README.md
8fb3f3f verified
---
license: apache-2.0
datasets:
- stanfordnlp/sst2
- fancyzhx/yelp_polarity
language:
- en
metrics:
- accuracy
base_model:
- distilbert/distilbert-base-uncased
new_version: AirrStorm/DistilBERT-SST2-Yelp
library_name: transformers
---
# Model Card for Fine-tuned DistilBERT-SST2 with Yelp Polarity
## Model Description
This model is a fine-tuned version of `distilbert-base-uncased`, a distilled version of BERT optimized for efficiency. It has been initially trained on the Stanford Sentiment Treebank (SST-2) dataset and further fine-tuned on the Yelp Polarity dataset for improved sentiment classification performance. The model classifies English text into two categories: positive and negative sentiment.
DistilBERT-SST2-Yelp is lightweight, fast, and ideal for sentiment analysis tasks on short texts such as customer reviews, product feedback, and social media posts.
---
## Intended Uses & Limitations
### Intended Uses:
- Sentiment analysis on short English texts, including:
- Reviews (e.g., product, restaurant, movie, etc.)
- Comments
- Tweets or other social media posts
- Applications requiring efficient, low-latency inference for real-time analysis.
### Limitations:
- **Domain Specificity:** Fine-tuned on SST-2 and Yelp Polarity, so it may not generalize well to highly specific or niche domains.
- **Context Length:** Optimized for short texts and may perform poorly with long-form inputs.
- **Language Support:** Works only for English text.
- **Biases:** May inherit biases present in the datasets, including biases related to language usage in sentiment analysis tasks.
---
## How to Use
```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
# Load the fine-tuned model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("./DistilBERT-SST2-Yelp")
model = AutoModelForSequenceClassification.from_pretrained("./DistilBERT-SST2-Yelp")
# Example input
text = "This movie was fantastic!"
# Tokenize and predict
inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True)
outputs = model(**inputs)
predicted_class = outputs.logits.argmax(-1).item()
# Class mapping: 0 -> Negative, 1 -> Positive
print("Predicted Sentiment:", "Positive" if predicted_class == 1 else "Negative")
```
## Limitations and Bias
- The SST-2 and Yelp Polarity datasets may reflect cultural, contextual, or domain-specific biases in sentiment interpretation.
- Over-reliance on specific patterns or keywords from the training data may lead to incorrect classifications, especially in nuanced or ambiguous cases.
- The model is not suitable for multilingual sentiment analysis or for detecting sentiment in specialized fields (e.g., legal, medical).
## Training Data
- **SST-2**: The Stanford Sentiment Treebank (SST-2) dataset, containing movie reviews labeled as positive or negative.
- **Yelp Polarity**: A dataset of customer reviews from Yelp, labeled as positive or negative.
The model is fine-tuned on both datasets to improve its performance on a variety of sentiment classification tasks.
## Training Procedure
- **Base Model**: distilbert-base-uncased
- **Framework**: Hugging Face Transformers
- **Optimizer**: AdamW with weight decay
- **Learning Rate**: 2e-5
- **Batch Size**: 32 (effective, using gradient accumulation)
- **Epochs**: 3
- **Evaluation Strategy**: Per epoch
- **Hardware**: NVIDIA RTX 4060 with CUDA support
### Optimizations:
- Mixed precision (fp16) for faster training and reduced memory usage.
- Gradient accumulation for simulating larger batch sizes.
- Learning rate warmup and weight decay for stable convergence.
## Evaluation Results
| **Dataset Split** | **Accuracy** |
|-------------------|--------------|
| Train (SST-2) | 98.5% |
| Test (SST-2) | 94.7% |
| Train (Yelp) | 93.5% |
| Test (Yelp) | 92.0% |
**Evaluation Metric**: Accuracy, computed using the Hugging Face `evaluate` library.
## Future Work
- Fine-tune on more diverse datasets, including domain-specific datasets for enhanced performance in other areas.
- Extend support to multilingual sentiment analysis.
- Improve efficiency for deployment through techniques such as pruning, quantization, or distillation.
## License
The model is shared under the [Apache 2.0 License](https://www.apache.org/licenses/LICENSE-2.0).