Update README.md
Browse files# Multilingual Hate Speech Classifier for Social Media Content
A multilingual [XLM-R-based](https://huggingface.co/FacebookAI/xlm-roberta-large) hate speech classification model for social media content trained on English, Italian and Slovenian data. Paper out soon...
**Training data**
* 103k English Youtube comments
* 119k Italian Youtube comments
* 50k Slovenian Twitter comments
**Evaluation data**
* 20k English Youtube comments
* 21k Italian Youtube comments
* 10k Slovenian Twitter comments
**Fine-tuning hyperparameters**
num_train_epochs=3,
train_batch_size=8,
learning_rate=6e-6
**Evaluation Results**
Model Accuracy vs. Inter-annotator agreement (0 - no agreement; 100 - perfect agreement):
| | Model-annotator Agreement | Inter-annotator Agreement |
|-----------|---------------------------|---------------------------|
| English | 79.97 | 82.91 |
| Italian | 82.00 | 81.79 |
| Slovenian | 78.84 | 79.43 |
Class-specific model F1-scores:
| | Appropriate | Inappropriate | Offensive | Violent |
|-----------|-------------|---------------|-----------|---------|
| English | 86.10 | 39.16 | 68.24 | 27.82 |
| Italian | 89.77 | 58.45 | 60.42 | 44.97 |
| Slovenian | 84.30 | 45.22 | 69.69 | 24.79 |
**Usage**
from transformers import AutoModelForSequenceClassification, TextClassificationPipeline, AutoTokenizer, AutoConfig
MODEL = "classla/xlm-r-parlasent"
tokenizer = AutoTokenizer.from_pretrained(MODEL)
config = AutoConfig.from_pretrained(MODEL)
model = AutoModelForSequenceClassification.from_pretrained(MODEL)
pipe = TextClassificationPipeline(model=model, tokenizer=tokenizer, return_all_scores=True,
task='sentiment_analysis', device=0, function_to_apply="none")
pipe([
"Thank you for using our model",
"Grazie per aver utilizzato il nostro modello"
"Hvala za uporabo našega modela"
])
@@ -1,3 +1,18 @@
|
|
1 |
-
---
|
2 |
-
license: mit
|
3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: mit
|
3 |
+
language:
|
4 |
+
- multilingual
|
5 |
+
- en
|
6 |
+
- it
|
7 |
+
- sl
|
8 |
+
metrics:
|
9 |
+
- f1
|
10 |
+
- accuracy
|
11 |
+
base_model: FacebookAI/xlm-roberta-large
|
12 |
+
pipeline_tag: text-classification
|
13 |
+
tags:
|
14 |
+
- hate-speech
|
15 |
+
- xlm-roberta
|
16 |
+
- Youtube
|
17 |
+
- Twitter
|
18 |
+
---
|