File size: 3,425 Bytes
1f34424 f4d75d0 e95d452 f4d75d0 e95d452 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 |
---
license: mit
datasets:
- nhull/tripadvisor-split-dataset-v2
language:
- en
pipeline_tag: text-classification
tags:
- sentiment-analysis
- logistic-regression
- text-classification
- hotel-reviews
- tripadvisor
- nlp
---
# Logistic Regression Sentiment Analysis Model
This model is a **Logistic Regression** classifier trained on the **TripAdvisor sentiment analysis dataset**. It predicts the sentiment of hotel reviews on a 1-5 star scale. The model takes text input (hotel reviews) and outputs a sentiment rating from 1 to 5 stars.
## Model Details
- **Model Type**: Logistic Regression
- **Task**: Sentiment Analysis
- **Input**: A hotel review (text)
- **Output**: Sentiment rating (1-5 stars)
- **Dataset Used**: TripAdvisor sentiment dataset (balanced labels)
## Intended Use
This model is designed to classify hotel reviews based on their sentiment. It assigns a star rating between 1 and 5 to a review, indicating the sentiment expressed in the review.
## How to Use the Model
1. **Install the required dependencies**:
```bash
pip install joblib
```
2. **Download and load the model**:
You can download the model from Hugging Face and use it to predict sentiment.
Example code to download and use the model:
```python
from huggingface_hub import hf_hub_download
import joblib
# Download model from Hugging Face
model_path = hf_hub_download(repo_id="your-username/logistic-regression-model", filename="logistic_regression_model.joblib")
# Load the model
model = joblib.load(model_path)
# Predict sentiment of a review
def predict_sentiment(review):
return model.predict([review])[0]
review = "This hotel was fantastic. The service was great and the room was clean."
print(f"Predicted sentiment: {predict_sentiment(review)}")
```
3. **The model will return a sentiment rating** between 1 and 5 stars, where:
- 1: Very bad
- 2: Bad
- 3: Neutral
- 4: Good
- 5: Very good
## Model Evaluation
- **Test Accuracy**: 61.05% on the test set.
- **Classification Report** (Test Set):
| Label | Precision | Recall | F1-score | Support |
|-------|-----------|--------|----------|---------|
| 1.0 | 0.70 | 0.73 | 0.71 | 1600 |
| 2.0 | 0.52 | 0.50 | 0.51 | 1600 |
| 3.0 | 0.57 | 0.54 | 0.55 | 1600 |
| 4.0 | 0.55 | 0.54 | 0.55 | 1600 |
| 5.0 | 0.71 | 0.74 | 0.72 | 1600 |
| **Accuracy** | - | - | **0.61** | 8000 |
| **Macro avg** | 0.61 | 0.61 | 0.61 | 8000 |
| **Weighted avg** | 0.61 | 0.61 | 0.61 | 8000 |
### Cross-validation Scores:
| Metric | Value |
|------------------------------------|--------------------------------------------|
| **Logistic Regression Cross-validation scores** | [0.61463816, 0.609375, 0.62072368, 0.59703947, 0.59835526] |
| **Logistic Regression Mean Cross-validation score** | 0.6080 |
## Limitations
- The model performs well on extreme ratings (1 and 5 stars) but struggles with intermediate ratings (2, 3, and 4 stars).
- The model was trained on the **TripAdvisor** dataset and may not generalize well to reviews from other sources or domains.
- The model does not handle aspects like sarcasm or humor well, and shorter reviews may lead to less accurate predictions. |