|
--- |
|
license: cc-by-nc-4.0 |
|
language: |
|
- az |
|
pipeline_tag: text-classification |
|
tags: |
|
- sentiment |
|
- analysis |
|
- azerbaijani |
|
widget: |
|
- text: Bu mənim xoşuma gəlir |
|
datasets: |
|
- LocalDoc/sentiments_dataset_azerbaijani |
|
--- |
|
# Sentiment Analysis Model for Azerbaijani Text |
|
This repository hosts a fine-tuned XLM-RoBERTa model for sentiment analysis on Azerbaijani text. The model is capable of classifying text into three categories: negative, neutral, and positive. |
|
|
|
## Model Description |
|
The model is based on `xlm-roberta-base`, which has been fine-tuned on a diverse dataset of Azerbaijani text samples. It is designed to understand the sentiment expressed in texts and classify them accordingly. |
|
|
|
## How to Use |
|
You can use this model directly with a pipeline for text classification, or you can use it with the `transformers` library for more custom usage, as shown in the example below. |
|
|
|
### Quick Start |
|
First, install the transformers library if you haven't already: |
|
```bash |
|
pip install transformers |
|
``` |
|
|
|
```python |
|
from transformers import AutoModelForSequenceClassification, XLMRobertaTokenizer |
|
import torch |
|
|
|
# Load the model and tokenizer from Hugging Face Hub |
|
model_name = "LocalDoc/sentiment_analysis_azerbaijani" |
|
tokenizer = XLMRobertaTokenizer.from_pretrained(model_name) |
|
model = AutoModelForSequenceClassification.from_pretrained(model_name) |
|
|
|
def predict_sentiment(text): |
|
# Encode the text using the tokenizer |
|
inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True, max_length=128) |
|
|
|
# Get predictions from the model |
|
with torch.no_grad(): |
|
outputs = model(**inputs) |
|
|
|
# Convert logits to probabilities using softmax |
|
probs = torch.nn.functional.softmax(outputs.logits, dim=-1) |
|
|
|
# Get the highest probability and corresponding label |
|
top_prob, top_label = torch.max(probs, dim=-1) |
|
labels = ["negative", "neutral", "positive"] |
|
|
|
# Return the label with the highest probability |
|
return labels[top_label], top_prob |
|
|
|
# Example text |
|
text = "Bu mənim xoşuma gəlir" |
|
|
|
# Get the sentiment |
|
predicted_label, probability = predict_sentiment(text) |
|
print(f"Predicted sentiment: {predicted_label} with a probability of {probability.item():.4f}") |
|
|
|
``` |
|
|
|
## Sentiment Label Information |
|
|
|
The model outputs a label for each prediction, corresponding to one of the sentiment categories listed below. Each label is associated with a specific sentiment as detailed in the following table: |
|
|
|
| Label | Sentiment | |
|
|-------|-----------| |
|
| 0 | Negative | |
|
| 1 | Neutral | |
|
| 2 | Positive | |
|
|
|
|
|
|
|
License |
|
|
|
The dataset is licensed under the Creative Commons Attribution-NonCommercial 4.0 International license. This license allows you to freely share and redistribute the dataset with attribution to the source but prohibits commercial use and the creation of derivative works. |
|
|
|
|
|
|
|
Contact information |
|
|
|
If you have any questions or suggestions, please contact us at [[email protected]]. |