--- license: apache-2.0 datasets: - stanfordnlp/sst2 - fancyzhx/yelp_polarity language: - en metrics: - accuracy base_model: - distilbert/distilbert-base-uncased new_version: AirrStorm/DistilBERT-SST2-Yelp library_name: transformers --- # Model Card for Fine-tuned DistilBERT-SST2 with Yelp Polarity ## Model Description This model is a fine-tuned version of `distilbert-base-uncased`, a distilled version of BERT optimized for efficiency. It has been initially trained on the Stanford Sentiment Treebank (SST-2) dataset and further fine-tuned on the Yelp Polarity dataset for improved sentiment classification performance. The model classifies English text into two categories: positive and negative sentiment. DistilBERT-SST2-Yelp is lightweight, fast, and ideal for sentiment analysis tasks on short texts such as customer reviews, product feedback, and social media posts. --- ## Intended Uses & Limitations ### Intended Uses: - Sentiment analysis on short English texts, including: - Reviews (e.g., product, restaurant, movie, etc.) - Comments - Tweets or other social media posts - Applications requiring efficient, low-latency inference for real-time analysis. ### Limitations: - **Domain Specificity:** Fine-tuned on SST-2 and Yelp Polarity, so it may not generalize well to highly specific or niche domains. - **Context Length:** Optimized for short texts and may perform poorly with long-form inputs. - **Language Support:** Works only for English text. - **Biases:** May inherit biases present in the datasets, including biases related to language usage in sentiment analysis tasks. --- ## How to Use ```python from transformers import AutoTokenizer, AutoModelForSequenceClassification # Load the fine-tuned model and tokenizer tokenizer = AutoTokenizer.from_pretrained("./DistilBERT-SST2-Yelp") model = AutoModelForSequenceClassification.from_pretrained("./DistilBERT-SST2-Yelp") # Example input text = "This movie was fantastic!" # Tokenize and predict inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True) outputs = model(**inputs) predicted_class = outputs.logits.argmax(-1).item() # Class mapping: 0 -> Negative, 1 -> Positive print("Predicted Sentiment:", "Positive" if predicted_class == 1 else "Negative") ``` ## Limitations and Bias - The SST-2 and Yelp Polarity datasets may reflect cultural, contextual, or domain-specific biases in sentiment interpretation. - Over-reliance on specific patterns or keywords from the training data may lead to incorrect classifications, especially in nuanced or ambiguous cases. - The model is not suitable for multilingual sentiment analysis or for detecting sentiment in specialized fields (e.g., legal, medical). ## Training Data - **SST-2**: The Stanford Sentiment Treebank (SST-2) dataset, containing movie reviews labeled as positive or negative. - **Yelp Polarity**: A dataset of customer reviews from Yelp, labeled as positive or negative. The model is fine-tuned on both datasets to improve its performance on a variety of sentiment classification tasks. ## Training Procedure - **Base Model**: distilbert-base-uncased - **Framework**: Hugging Face Transformers - **Optimizer**: AdamW with weight decay - **Learning Rate**: 2e-5 - **Batch Size**: 32 (effective, using gradient accumulation) - **Epochs**: 3 - **Evaluation Strategy**: Per epoch - **Hardware**: NVIDIA RTX 4060 with CUDA support ### Optimizations: - Mixed precision (fp16) for faster training and reduced memory usage. - Gradient accumulation for simulating larger batch sizes. - Learning rate warmup and weight decay for stable convergence. ## Evaluation Results | **Dataset Split** | **Accuracy** | |-------------------|--------------| | Train (SST-2) | 98.5% | | Test (SST-2) | 94.7% | | Train (Yelp) | 93.5% | | Test (Yelp) | 92.0% | **Evaluation Metric**: Accuracy, computed using the Hugging Face `evaluate` library. ## Future Work - Fine-tune on more diverse datasets, including domain-specific datasets for enhanced performance in other areas. - Extend support to multilingual sentiment analysis. - Improve efficiency for deployment through techniques such as pruning, quantization, or distillation. ## License The model is shared under the [Apache 2.0 License](https://www.apache.org/licenses/LICENSE-2.0).