Bojan commited on
Commit
c46d81c
·
verified ·
1 Parent(s): ac70c3c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +53 -1
README.md CHANGED
@@ -15,4 +15,56 @@ tags:
15
  - xlm-roberta
16
  - Youtube
17
  - Twitter
18
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
15
  - xlm-roberta
16
  - Youtube
17
  - Twitter
18
+ ---
19
+
20
+ # Multilingual Hate Speech Classifier for Social Media Content
21
+
22
+ A multilingual [XLM-R-based (100 languages)](https://huggingface.co/FacebookAI/xlm-roberta-large) hate speech classification model fine-tuned on English, Italian and Slovenian data. Paper out soon...
23
+
24
+ **Training data**
25
+ * 103k English Youtube comments
26
+ * 119k Italian Youtube comments
27
+ * 50k Slovenian Twitter comments
28
+
29
+ **Evaluation data**
30
+ * 20k English Youtube comments
31
+ * 21k Italian Youtube comments
32
+ * 10k Slovenian Twitter comments
33
+
34
+ **Fine-tuning hyperparameters**
35
+
36
+ num_train_epochs=3,
37
+ train_batch_size=8,
38
+ learning_rate=6e-6
39
+
40
+ **Evaluation Results**
41
+ Model agreement (accuracy) vs. Inter-annotator agreement (0 - no agreement; 100 - perfect agreement):
42
+ | | Model-annotator Agreement | Inter-annotator Agreement |
43
+ |-----------|---------------------------|---------------------------|
44
+ | English | 79.97 | 82.91 |
45
+ | Italian | 82.00 | 81.79 |
46
+ | Slovenian | 78.84 | 79.43 |
47
+
48
+ Class-specific model F1-scores:
49
+ | | Appropriate | Inappropriate | Offensive | Violent |
50
+ |-----------|-------------|---------------|-----------|---------|
51
+ | English | 86.10 | 39.16 | 68.24 | 27.82 |
52
+ | Italian | 89.77 | 58.45 | 60.42 | 44.97 |
53
+ | Slovenian | 84.30 | 45.22 | 69.69 | 24.79 |
54
+
55
+ **Usage**
56
+
57
+ from transformers import AutoModelForSequenceClassification, TextClassificationPipeline, AutoTokenizer, AutoConfig
58
+
59
+ MODEL = "classla/xlm-r-parlasent"
60
+ tokenizer = AutoTokenizer.from_pretrained(MODEL)
61
+ config = AutoConfig.from_pretrained(MODEL)
62
+ model = AutoModelForSequenceClassification.from_pretrained(MODEL)
63
+
64
+ pipe = TextClassificationPipeline(model=model, tokenizer=tokenizer, return_all_scores=True,
65
+ task='sentiment_analysis', device=0, function_to_apply="none")
66
+ pipe([
67
+ "Thank you for using our model",
68
+ "Grazie per aver utilizzato il nostro modello"
69
+ "Hvala za uporabo našega modela"
70
+ ])