LenDigLearn's picture
Update README.md
481deca verified
|
raw
history blame
1.47 kB
---
license: mit
language:
- en
- de
- es
- fr
- pt
metrics:
- accuracy
base_model:
- microsoft/mdeberta-v3-base
pipeline_tag: text-classification
tags:
- formal or informal classification
widget:
- text: Bitte geh einkaufen.
- text: Können Sie mir spontan dabei helfen?
- text: Als nächstes kommen 4g Champignons und 500g Mehl dazu.
---
# formality-classifier-mdeberta-v3-base
This model can classify texts based on their formality. It classifies inputs into one of the three classes `["formal", "informal", "neutral"]`, with neutral pertaining to texts which do not have a clear formality, such as passive statements etc.
In selecting and generating training data, a focus was put on languages that actually have a type of formal address etc., including French, German, Italian, Portuguese and Spanish.
Some samples from [osyvokon/pavlick-formality-scores](https://huggingface.co/datasets/osyvokon/pavlick-formality-scores) were also used to try and teach the model to classify English inputs.
## Results
Accuracy on the test set:
| Language | Accuracy |
| --- | --- |
| all | 88.93% |
| English | 79.20% |
| French | 100% |
| German | 97.73% |
| Italian | 97.83% |
| Portuguese | 100% |
| Spanish | 98.53% |
Confusion Matrix:
![](confusion_matrix.svg)
By Language:
![](confusion_matrix_en.svg)
![](confusion_matrix_fr.svg)
![](confusion_matrix_de.svg)
![](confusion_matrix_it.svg)
![](confusion_matrix_pt.svg)
![](confusion_matrix_es.svg)