|
--- |
|
language: ["ru"] |
|
tags: |
|
- russian |
|
- classification |
|
- sentiment |
|
- multiclass |
|
widget: |
|
- text: "����� ������� ��� ���� �������� ����!" |
|
--- |
|
## Sentiment model based on rubert-base-cased-conversational |
|
This model was initialized with [rubert-base-cased-conversational](https://huggingface.co/DeepPavlov/rubert-base-cased-conversational) weights and trained on a batch of datasets collected by [Smetanin](https://duckduckgo.com), using the same training sampling presented in [this wonderful work](https://huggingface.co/cointegrated/rubert-tiny-sentiment-balanced). This approach allows for a uniform distribution among different datasets and three classes of sentiment labels: negative, neutral, and positive. Datasets were prepared by David Dale and are hosted [here](https://drive.google.com/file/d/1dir_lixYfReDXxRS5oGGljH8T_f7vVqm/view). |
|
|
|
I chose rubert-base-cased-conversational weights because, according to Smetanin's work, this model ranks first among all other multilingual and popular Russian language models with BERT base architecture. |
|
|
|
### Training and Testing Details |
|
This model was trained and tested using the code and hyperparameters from the [rubert-tiny-sentiment-balanced](https://huggingface.co/cointegrated/rubert-tiny-sentiment-balanced) work. |
|
|
|
### Labels |
|
There are only three labels: negative - 0, neutral - 1, positive - 2 |
|
|
|
## Results |
|
It outperforms rubert-tiny-sentiment-balanced on four datasets, underperforms on one (linis), and has the same performance on mokoron and rureviews. See [this](https://huggingface.co/cointegrated/rubert-tiny-sentiment-balanced) for the comparison. |
|
|
|
| Source | Macro F1 | |
|
| ----------- | ----------- | |
|
| SentiRuEval2016_banks | 0.88 | |
|
| SentiRuEval2016_tele | 0.79 | |
|
| kaggle_news | 0.73 | |
|
| linis | 0.46 | |
|
| mokoron | 0.98 | |
|
| rureviews | 0.77 | |
|
| rusentiment | 0.74 | |