AirrStorm's picture
Update README.md
8fb3f3f verified
metadata
license: apache-2.0
datasets:
  - stanfordnlp/sst2
  - fancyzhx/yelp_polarity
language:
  - en
metrics:
  - accuracy
base_model:
  - distilbert/distilbert-base-uncased
new_version: AirrStorm/DistilBERT-SST2-Yelp
library_name: transformers

Model Card for Fine-tuned DistilBERT-SST2 with Yelp Polarity

Model Description

This model is a fine-tuned version of distilbert-base-uncased, a distilled version of BERT optimized for efficiency. It has been initially trained on the Stanford Sentiment Treebank (SST-2) dataset and further fine-tuned on the Yelp Polarity dataset for improved sentiment classification performance. The model classifies English text into two categories: positive and negative sentiment.

DistilBERT-SST2-Yelp is lightweight, fast, and ideal for sentiment analysis tasks on short texts such as customer reviews, product feedback, and social media posts.


Intended Uses & Limitations

Intended Uses:

  • Sentiment analysis on short English texts, including:
    • Reviews (e.g., product, restaurant, movie, etc.)
    • Comments
    • Tweets or other social media posts
  • Applications requiring efficient, low-latency inference for real-time analysis.

Limitations:

  • Domain Specificity: Fine-tuned on SST-2 and Yelp Polarity, so it may not generalize well to highly specific or niche domains.
  • Context Length: Optimized for short texts and may perform poorly with long-form inputs.
  • Language Support: Works only for English text.
  • Biases: May inherit biases present in the datasets, including biases related to language usage in sentiment analysis tasks.

How to Use

from transformers import AutoTokenizer, AutoModelForSequenceClassification

# Load the fine-tuned model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("./DistilBERT-SST2-Yelp")
model = AutoModelForSequenceClassification.from_pretrained("./DistilBERT-SST2-Yelp")

# Example input
text = "This movie was fantastic!"

# Tokenize and predict
inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True)
outputs = model(**inputs)
predicted_class = outputs.logits.argmax(-1).item()

# Class mapping: 0 -> Negative, 1 -> Positive
print("Predicted Sentiment:", "Positive" if predicted_class == 1 else "Negative")

Limitations and Bias

  • The SST-2 and Yelp Polarity datasets may reflect cultural, contextual, or domain-specific biases in sentiment interpretation.
  • Over-reliance on specific patterns or keywords from the training data may lead to incorrect classifications, especially in nuanced or ambiguous cases.
  • The model is not suitable for multilingual sentiment analysis or for detecting sentiment in specialized fields (e.g., legal, medical).

Training Data

  • SST-2: The Stanford Sentiment Treebank (SST-2) dataset, containing movie reviews labeled as positive or negative.
  • Yelp Polarity: A dataset of customer reviews from Yelp, labeled as positive or negative.

The model is fine-tuned on both datasets to improve its performance on a variety of sentiment classification tasks.

Training Procedure

  • Base Model: distilbert-base-uncased
  • Framework: Hugging Face Transformers
  • Optimizer: AdamW with weight decay
  • Learning Rate: 2e-5
  • Batch Size: 32 (effective, using gradient accumulation)
  • Epochs: 3
  • Evaluation Strategy: Per epoch
  • Hardware: NVIDIA RTX 4060 with CUDA support

Optimizations:

  • Mixed precision (fp16) for faster training and reduced memory usage.
  • Gradient accumulation for simulating larger batch sizes.
  • Learning rate warmup and weight decay for stable convergence.

Evaluation Results

Dataset Split Accuracy
Train (SST-2) 98.5%
Test (SST-2) 94.7%
Train (Yelp) 93.5%
Test (Yelp) 92.0%

Evaluation Metric: Accuracy, computed using the Hugging Face evaluate library.

Future Work

  • Fine-tune on more diverse datasets, including domain-specific datasets for enhanced performance in other areas.
  • Extend support to multilingual sentiment analysis.
  • Improve efficiency for deployment through techniques such as pruning, quantization, or distillation.

License

The model is shared under the Apache 2.0 License.