|
--- |
|
license: mit |
|
language: |
|
- en |
|
- de |
|
- es |
|
- fr |
|
- pt |
|
metrics: |
|
- accuracy |
|
base_model: |
|
- microsoft/mdeberta-v3-base |
|
pipeline_tag: text-classification |
|
tags: |
|
- formal or informal classification |
|
widget: |
|
- text: Bitte geh einkaufen. |
|
- text: Können Sie mir spontan dabei helfen? |
|
- text: Als nächstes kommen 4g Champignons und 500g Mehl dazu. |
|
--- |
|
|
|
|
|
# formality-classifier-mdeberta-v3-base |
|
|
|
This model can classify texts based on their formality. It classifies inputs into one of the three classes `["formal", "informal", "neutral"]`, with neutral pertaining to texts which do not have a clear formality, such as passive statements etc. |
|
|
|
|
|
In selecting and generating training data, a focus was put on languages that actually have a type of formal address etc., including French, German, Italian, Portuguese and Spanish. |
|
Some samples from [osyvokon/pavlick-formality-scores](https://huggingface.co/datasets/osyvokon/pavlick-formality-scores) were also used to try and teach the model to classify English inputs. |
|
|
|
|
|
|
|
|
|
## Results |
|
|
|
Accuracy on the test set: |
|
|
|
| Language | Accuracy | |
|
| --- | --- | |
|
| all | 88.93% | |
|
| English | 79.20% | |
|
| French | 100% | |
|
| German | 97.73% | |
|
| Italian | 97.83% | |
|
| Portuguese | 100% | |
|
| Spanish | 98.53% | |
|
|
|
Confusion Matrix: |
|
|
|
 |
|
|
|
By Language: |
|
|
|
 |
|
|
|
 |
|
|
|
 |
|
|
|
 |
|
|
|
 |
|
|
|
 |