Exqrch commited on
Commit
4fb2af1
·
verified ·
1 Parent(s): 0104725

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +66 -3
README.md CHANGED
@@ -1,3 +1,66 @@
1
- ---
2
- license: cc-by-sa-4.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: cc-by-sa-4.0
3
+ ---
4
+
5
+ # IndoBERTweet-HateSpeech
6
+
7
+ ## Model Description
8
+ IndoBERTweet fine-tuned on IndoToxic2024 dataset, with an accuracy of 0.89 and macro-F1 of 0.78. Performance are obtained through stratified 10-fold cross-validation.
9
+
10
+ ## Supported Tokenizer
11
+ - **indolem/indobertweet-base-uncased**
12
+
13
+ ## Example Code
14
+ ```python
15
+ import torch
16
+ from transformers import AutoModelForSequenceClassification, AutoTokenizer
17
+
18
+ # Specify the model and tokenizer name
19
+ model_name = "Exqrch/IndoBERTweet-HateSpeech"
20
+ tokenizer_name = "indolem/indobertweet-base-uncased"
21
+
22
+ # Load the pre-trained model
23
+ model = AutoModelForSequenceClassification.from_pretrained(model_name)
24
+
25
+ # Load the tokenizer
26
+ tokenizer = AutoTokenizer.from_pretrained(tokenizer_name)
27
+
28
+ text = "selamat pagi semua!"
29
+
30
+ output = model(**tokenizer(text, return_tensors="pt"))
31
+ logits = output.logits
32
+
33
+ # Get the predicted class label
34
+ predicted_class = torch.argmax(logits, dim=-1).item()
35
+
36
+ print(predicted_class)
37
+ --- Output ---
38
+ > 0
39
+ --- End of Output ---
40
+ ```
41
+
42
+ ## Limitations
43
+ Trained only on Indonesian texts. No information on code-switched text performance.
44
+
45
+ ## Sample Output
46
+ ```
47
+ Model name: Exqrch/IndoBERTweet-HateSpeech
48
+ Text 1: Kenapa sih mereka berantem terus?
49
+ Prediction: 0
50
+ Text 2: Orang gila emang elu!
51
+ Prediction: 1
52
+ ```
53
+
54
+ ## Citation
55
+ If used, please cite:
56
+ ```
57
+ @article{susanto2024indotoxic2024,
58
+ title={IndoToxic2024: A Demographically-Enriched Dataset of Hate Speech and Toxicity Types for Indonesian Language},
59
+ author={Lucky Susanto and Musa Izzanardi Wijanarko and Prasetia Anugrah Pratama and Traci Hong and Ika Idris and Alham Fikri Aji and Derry Wijaya},
60
+ year={2024},
61
+ eprint={2406.19349},
62
+ archivePrefix={arXiv},
63
+ primaryClass={cs.CL},
64
+ url={https://arxiv.org/abs/2406.19349},
65
+ }
66
+ ```