metadata

license: apache-2.0
datasets:
  - nhull/tripadvisor-split-dataset-v2
language:
  - en
pipeline_tag: text-classification
tags:
  - sentiment-analysis
  - logistic-regression
  - text-classification
  - hotel-reviews
  - tripadvisor
  - nlp

Logistic Regression Sentiment Analysis Model

This model is a Logistic Regression classifier trained on the TripAdvisor sentiment analysis dataset. It predicts the sentiment of hotel reviews on a 1-5 star scale. The model takes text input (hotel reviews) and outputs a sentiment rating from 1 to 5 stars.

Model Details

Model Type: Logistic Regression
Task: Sentiment Analysis
Input: A hotel review (text)
Output: Sentiment rating (1-5 stars)
Trained Dataset: nhull/tripadvisor-split-dataset-v2

Intended Use

This model is designed to classify hotel reviews based on their sentiment. It assigns a star rating between 1 and 5 to a review, indicating the sentiment expressed in the review.

The model will return a sentiment rating between 1 and 5 stars, where:

1: Very bad
2: Bad
3: Neutral
4: Good
5: Very good

Dataset

The dataset used for training, validation, and testing is nhull/tripadvisor-split-dataset-v2. It consists of:

Training Set: 30,400 reviews
Validation Set: 1,600 reviews
Test Set: 8,000 reviews

All splits are balanced across five sentiment labels.

Test Performance

Model predicts too high on average by 0.44.

Test Accuracy: 61.05% on the test set.
Classification Report (Test Set):

Label	Precision	Recall	F1-score	Support
1.0	0.70	0.73	0.71	1600
2.0	0.52	0.50	0.51	1600
3.0	0.57	0.54	0.55	1600
4.0	0.55	0.54	0.55	1600
5.0	0.71	0.74	0.72	1600
Accuracy	-	-	0.61	8000
Macro avg	0.61	0.61	0.61	8000
Weighted avg	0.61	0.61	0.61	8000

Limitations

The model performs well on extreme ratings (1 and 5 stars) but struggles with intermediate ratings (2, 3, and 4 stars).
The model was trained on the TripAdvisor dataset and may not generalize well to reviews from other sources or domains.
The model does not handle aspects like sarcasm or humor well, and shorter reviews may lead to less accurate predictions.