File size: 3,425 Bytes

---
license: mit
datasets:
- nhull/tripadvisor-split-dataset-v2
language:
- en
pipeline_tag: text-classification
tags:
- sentiment-analysis
- logistic-regression
- text-classification
- hotel-reviews
- tripadvisor
- nlp
---

# Logistic Regression Sentiment Analysis Model

This model is a **Logistic Regression** classifier trained on the **TripAdvisor sentiment analysis dataset**. It predicts the sentiment of hotel reviews on a 1-5 star scale. The model takes text input (hotel reviews) and outputs a sentiment rating from 1 to 5 stars.

## Model Details

- **Model Type**: Logistic Regression
- **Task**: Sentiment Analysis
- **Input**: A hotel review (text)
- **Output**: Sentiment rating (1-5 stars)
- **Dataset Used**: TripAdvisor sentiment dataset (balanced labels)

## Intended Use

This model is designed to classify hotel reviews based on their sentiment. It assigns a star rating between 1 and 5 to a review, indicating the sentiment expressed in the review.

## How to Use the Model

1. **Install the required dependencies**:
    ```bash
    pip install joblib
    ```

2. **Download and load the model**:
    You can download the model from Hugging Face and use it to predict sentiment.

    Example code to download and use the model:
    ```python
    from huggingface_hub import hf_hub_download
    import joblib

    # Download model from Hugging Face
    model_path = hf_hub_download(repo_id="your-username/logistic-regression-model", filename="logistic_regression_model.joblib")

    # Load the model
    model = joblib.load(model_path)

    # Predict sentiment of a review
    def predict_sentiment(review):
        return model.predict([review])[0]

    review = "This hotel was fantastic. The service was great and the room was clean."
    print(f"Predicted sentiment: {predict_sentiment(review)}")
    ```

3. **The model will return a sentiment rating** between 1 and 5 stars, where:
   - 1: Very bad
   - 2: Bad
   - 3: Neutral
   - 4: Good
   - 5: Very good

## Model Evaluation

- **Test Accuracy**: 61.05% on the test set.
  
- **Classification Report** (Test Set):

| Label | Precision | Recall | F1-score | Support |
|-------|-----------|--------|----------|---------|
| 1.0   | 0.70      | 0.73   | 0.71     | 1600    |
| 2.0   | 0.52      | 0.50   | 0.51     | 1600    |
| 3.0   | 0.57      | 0.54   | 0.55     | 1600    |
| 4.0   | 0.55      | 0.54   | 0.55     | 1600    |
| 5.0   | 0.71      | 0.74   | 0.72     | 1600    |
| **Accuracy** | -   | -      | **0.61**  | 8000    |
| **Macro avg** | 0.61 | 0.61   | 0.61     | 8000    |
| **Weighted avg** | 0.61 | 0.61 | 0.61     | 8000    |
 
### Cross-validation Scores:

| Metric                             | Value                                      |
|------------------------------------|--------------------------------------------|
| **Logistic Regression Cross-validation scores** | [0.61463816, 0.609375, 0.62072368, 0.59703947, 0.59835526] |
| **Logistic Regression Mean Cross-validation score** | 0.6080                                     |

## Limitations

- The model performs well on extreme ratings (1 and 5 stars) but struggles with intermediate ratings (2, 3, and 4 stars).
- The model was trained on the **TripAdvisor** dataset and may not generalize well to reviews from other sources or domains.
- The model does not handle aspects like sarcasm or humor well, and shorter reviews may lead to less accurate predictions.