Model Card for Fine-Tuned UmBERTo

UmBERTo model fine-tuned for sentiment analysis in Italian, trained on Google Colab. The model was trained to classify the sentiment of Italian texts, categorizing them into three classes: positive, negative, and neutral. The model achieves an F1-score of 0.68 on the target task.

Model Details

Model Description

RoBERTa model for Italian fine-tuned on a specific text classification task.

  • Developed by: Francesco Labbate, Federico Rosati
  • Model type: RoBERTa-based text classification
  • Language(s) (NLP): Italiano
  • License: MIT
  • Finetuned from model: Musixmatch/umberto-commoncrawl-cased-v1

Uses

Direct Use

Text classification in Italian for the specific task it was trained on.

Bias, Risks, and Limitations

The model may inherit biases present in the base model and training data. It is limited to the Italian language and the specific domain of the training data.

Recommendations

Evaluate the model's performance on specific datasets before using it in production.

How to Get Started with the Model

from transformers import AutoTokenizer, AutoModelForSequenceClassification

tokenizer = AutoTokenizer.from_pretrained("Frabbate/umberto-commoncrawl-cased-sentiment")
model = AutoModelForSequenceClassification.from_pretrained("Frabbate/umberto-commoncrawl-cased-sentiment")

Training Details

Training Data

The training was performed using the SentiPolC dataset, available at the following link: http://www.di.unito.it/~tutreeb/sentipolc-evalita16/data.html.

Training Procedure

Training Hyperparameters

  • Epochs: 3
  • Batch size: 4
  • Learning rate: 2e-5

Metrics

  • Metric: F1-score
  • Value: 0.68

Results

The model achieved an F1-score of 0.68 on the test set, demonstrating good performance for the target task.

Citation

@misc{musixmatch-2020-umberto,
  author = {Loreto Parisi and Simone Francia and Paolo Magnani},
  title = {UmBERTo: an Italian Language Model trained with Whole Word Masking},
  year = {2020},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/musixmatchresearch/umberto}}
}
Downloads last month
2
Safetensors
Model size
111M params
Tensor type
F32
·
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and the model is not deployed on the HF Inference API.

Model tree for Frabbate/umberto-commoncrawl-cased-sentiment

Finetuned
(16)
this model