|
--- |
|
language: |
|
- en |
|
tags: |
|
- sentiment |
|
- bert |
|
- sentiment-analysis |
|
- transformers |
|
- Transformateurs |
|
pipeline_tag: text-classification |
|
--- |
|
|
|
Sentiment Analysis Model for Hotel Reviews |
|
This model performs sentiment analysis on hotel reviews. The goal is to classify reviews into one of the three categories: Negative, Neutral, or Positive. |
|
|
|
Model Description |
|
This model is based on the BERT (Bidirectional Encoder Representations from Transformers) model, specifically bert-base-uncased. |
|
|
|
Training Procedure |
|
The model was trained on the TripAdvisor hotel reviews dataset. Each review in the dataset is associated with a rating from 1 to 5. |
|
The ratings were converted to sentiment labels as follows: |
|
|
|
Ratings of 1 and 2 were labelled as 'Negative' |
|
Rating of 3 was labelled as 'Neutral' |
|
Ratings of 4 and 5 were labelled as 'Positive' |
|
The text of each review was preprocessed by lowercasing, removing punctuation, emojis, and stop words, and tokenized with the BERT tokenizer. |
|
|
|
The model was trained with a learning rate of 2e-5, an epsilon of 1e-8, and a batch size of 6 for 5 epochs. |
|
|
|
Evaluation |
|
The model was evaluated using a weighted F1 score. |
|
|
|
Usage |
|
To use the model, load it and use it to classify a review. For example: |
|
|
|
|
|
from transformers import AutoTokenizer, AutoModelForSequenceClassification |
|
|
|
tokenizer = AutoTokenizer.from_pretrained("<Group209>") |
|
model = AutoModelForSequenceClassification.from_pretrained("<Group209>") |
|
|
|
text = "The hotel was great and the staff were very friendly." |
|
|
|
encoded_input = tokenizer(text, truncation=True, padding=True, return_tensors='pt') |
|
output = model(**encoded_input) |
|
predictions = output.logits.argmax(dim=1) |
|
|
|
print(predictions) |
|
|
|
Limitations and Bias |
|
The model is trained on English data, so it might not perform well on reviews in other languages. |
|
Furthermore, it might be biased towards certain phrases or words that are commonly used in the training dataset. |