File size: 3,425 Bytes
1f34424
 
 
 
 
 
 
 
 
 
 
 
 
 
f4d75d0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
e95d452
f4d75d0
e95d452
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
---
license: mit
datasets:
- nhull/tripadvisor-split-dataset-v2
language:
- en
pipeline_tag: text-classification
tags:
- sentiment-analysis
- logistic-regression
- text-classification
- hotel-reviews
- tripadvisor
- nlp
---

# Logistic Regression Sentiment Analysis Model

This model is a **Logistic Regression** classifier trained on the **TripAdvisor sentiment analysis dataset**. It predicts the sentiment of hotel reviews on a 1-5 star scale. The model takes text input (hotel reviews) and outputs a sentiment rating from 1 to 5 stars.

## Model Details

- **Model Type**: Logistic Regression
- **Task**: Sentiment Analysis
- **Input**: A hotel review (text)
- **Output**: Sentiment rating (1-5 stars)
- **Dataset Used**: TripAdvisor sentiment dataset (balanced labels)

## Intended Use

This model is designed to classify hotel reviews based on their sentiment. It assigns a star rating between 1 and 5 to a review, indicating the sentiment expressed in the review.

## How to Use the Model

1. **Install the required dependencies**:
    ```bash
    pip install joblib
    ```

2. **Download and load the model**:
    You can download the model from Hugging Face and use it to predict sentiment.

    Example code to download and use the model:
    ```python
    from huggingface_hub import hf_hub_download
    import joblib

    # Download model from Hugging Face
    model_path = hf_hub_download(repo_id="your-username/logistic-regression-model", filename="logistic_regression_model.joblib")

    # Load the model
    model = joblib.load(model_path)

    # Predict sentiment of a review
    def predict_sentiment(review):
        return model.predict([review])[0]

    review = "This hotel was fantastic. The service was great and the room was clean."
    print(f"Predicted sentiment: {predict_sentiment(review)}")
    ```

3. **The model will return a sentiment rating** between 1 and 5 stars, where:
   - 1: Very bad
   - 2: Bad
   - 3: Neutral
   - 4: Good
   - 5: Very good

## Model Evaluation

- **Test Accuracy**: 61.05% on the test set.
  
- **Classification Report** (Test Set):

| Label | Precision | Recall | F1-score | Support |
|-------|-----------|--------|----------|---------|
| 1.0   | 0.70      | 0.73   | 0.71     | 1600    |
| 2.0   | 0.52      | 0.50   | 0.51     | 1600    |
| 3.0   | 0.57      | 0.54   | 0.55     | 1600    |
| 4.0   | 0.55      | 0.54   | 0.55     | 1600    |
| 5.0   | 0.71      | 0.74   | 0.72     | 1600    |
| **Accuracy** | -   | -      | **0.61**  | 8000    |
| **Macro avg** | 0.61 | 0.61   | 0.61     | 8000    |
| **Weighted avg** | 0.61 | 0.61 | 0.61     | 8000    |
 
### Cross-validation Scores:

| Metric                             | Value                                      |
|------------------------------------|--------------------------------------------|
| **Logistic Regression Cross-validation scores** | [0.61463816, 0.609375, 0.62072368, 0.59703947, 0.59835526] |
| **Logistic Regression Mean Cross-validation score** | 0.6080                                     |

## Limitations

- The model performs well on extreme ratings (1 and 5 stars) but struggles with intermediate ratings (2, 3, and 4 stars).
- The model was trained on the **TripAdvisor** dataset and may not generalize well to reviews from other sources or domains.
- The model does not handle aspects like sarcasm or humor well, and shorter reviews may lead to less accurate predictions.