Update README.md

8fb3f3f verified 3 months ago

4.35 kB

	---
	license: apache-2.0
	datasets:
	- stanfordnlp/sst2
	- fancyzhx/yelp_polarity
	language:
	- en
	metrics:
	- accuracy
	base_model:
	- distilbert/distilbert-base-uncased
	new_version: AirrStorm/DistilBERT-SST2-Yelp
	library_name: transformers
	---

	# Model Card for Fine-tuned DistilBERT-SST2 with Yelp Polarity

	## Model Description

	This model is a fine-tuned version of `distilbert-base-uncased`, a distilled version of BERT optimized for efficiency. It has been initially trained on the Stanford Sentiment Treebank (SST-2) dataset and further fine-tuned on the Yelp Polarity dataset for improved sentiment classification performance. The model classifies English text into two categories: positive and negative sentiment.

	DistilBERT-SST2-Yelp is lightweight, fast, and ideal for sentiment analysis tasks on short texts such as customer reviews, product feedback, and social media posts.

	---

	## Intended Uses & Limitations

	### Intended Uses:
	- Sentiment analysis on short English texts, including:
	- Reviews (e.g., product, restaurant, movie, etc.)
	- Comments
	- Tweets or other social media posts
	- Applications requiring efficient, low-latency inference for real-time analysis.

	### Limitations:
	- Domain Specificity: Fine-tuned on SST-2 and Yelp Polarity, so it may not generalize well to highly specific or niche domains.
	- Context Length: Optimized for short texts and may perform poorly with long-form inputs.
	- Language Support: Works only for English text.
	- Biases: May inherit biases present in the datasets, including biases related to language usage in sentiment analysis tasks.

	---

	## How to Use

	```python
	from transformers import AutoTokenizer, AutoModelForSequenceClassification

	# Load the fine-tuned model and tokenizer
	tokenizer = AutoTokenizer.from_pretrained("./DistilBERT-SST2-Yelp")
	model = AutoModelForSequenceClassification.from_pretrained("./DistilBERT-SST2-Yelp")

	# Example input
	text = "This movie was fantastic!"

	# Tokenize and predict
	inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True)
	outputs = model(**inputs)
	predicted_class = outputs.logits.argmax(-1).item()

	# Class mapping: 0 -> Negative, 1 -> Positive
	print("Predicted Sentiment:", "Positive" if predicted_class == 1 else "Negative")
	```

	## Limitations and Bias

	- The SST-2 and Yelp Polarity datasets may reflect cultural, contextual, or domain-specific biases in sentiment interpretation.
	- Over-reliance on specific patterns or keywords from the training data may lead to incorrect classifications, especially in nuanced or ambiguous cases.
	- The model is not suitable for multilingual sentiment analysis or for detecting sentiment in specialized fields (e.g., legal, medical).

	## Training Data

	- SST-2: The Stanford Sentiment Treebank (SST-2) dataset, containing movie reviews labeled as positive or negative.
	- Yelp Polarity: A dataset of customer reviews from Yelp, labeled as positive or negative.

	The model is fine-tuned on both datasets to improve its performance on a variety of sentiment classification tasks.

	## Training Procedure

	- Base Model: distilbert-base-uncased
	- Framework: Hugging Face Transformers
	- Optimizer: AdamW with weight decay
	- Learning Rate: 2e-5
	- Batch Size: 32 (effective, using gradient accumulation)
	- Epochs: 3
	- Evaluation Strategy: Per epoch
	- Hardware: NVIDIA RTX 4060 with CUDA support

	### Optimizations:
	- Mixed precision (fp16) for faster training and reduced memory usage.
	- Gradient accumulation for simulating larger batch sizes.
	- Learning rate warmup and weight decay for stable convergence.

	## Evaluation Results

	\| Dataset Split \| Accuracy \|
	\|-------------------\|--------------\|
	\| Train (SST-2) \| 98.5% \|
	\| Test (SST-2) \| 94.7% \|
	\| Train (Yelp) \| 93.5% \|
	\| Test (Yelp) \| 92.0% \|

	Evaluation Metric: Accuracy, computed using the Hugging Face `evaluate` library.

	## Future Work

	- Fine-tune on more diverse datasets, including domain-specific datasets for enhanced performance in other areas.
	- Extend support to multilingual sentiment analysis.
	- Improve efficiency for deployment through techniques such as pruning, quantization, or distillation.

	## License

	The model is shared under the [Apache 2.0 License](https://www.apache.org/licenses/LICENSE-2.0).