dardem commited on
Commit
8935103
·
1 Parent(s): 47c7b45

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +26 -0
README.md ADDED
@@ -0,0 +1,26 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ --
2
+ language:
3
+ - en
4
+ tags:
5
+ - toxic comments classification
6
+ licenses:
7
+ - cc-by-nc-sa
8
+ ---
9
+ ## Toxicity Classification Model (but for the first part of the data)
10
+ This model is trained for toxicity classification task. The dataset used for training is the merge of the English parts of the three datasets by **Jigsaw** ([Jigsaw 2018](https://www.kaggle.com/c/jigsaw-toxic-comment-classification-challenge), [Jigsaw 2019](https://www.kaggle.com/c/jigsaw-unintended-bias-in-toxicity-classification), [Jigsaw 2020](https://www.kaggle.com/c/jigsaw-multilingual-toxic-comment-classification)), containing around 2 million examples. We split it into two parts and fine-tune a RoBERTa model ([RoBERTa: A Robustly Optimized BERT Pretraining Approach](https://arxiv.org/abs/1907.11692)) on it. THIS MODEL WAS FINE-TUNED ON THE FIRST PART. The classifiers perform closely on the test set of the first Jigsaw competition, reaching the **AUC-ROC** of 0.98 and **F1-score** of 0.76.
11
+ ## How to use
12
+ ```python
13
+ from transformers import RobertaTokenizer, RobertaForSequenceClassification
14
+ # load tokenizer and model weights, but be careful, here we need to use auth token
15
+ tokenizer = RobertaTokenizer.from_pretrained('SkolkovoInstitute/roberta_toxicity_classifier', use_auth_token=True)
16
+ model = RobertaForSequenceClassification.from_pretrained('SkolkovoInstitute/roberta_toxicity_classifier', use_auth_token=True)
17
+ # prepare the input
18
+ batch = tokenizer.encode('you are amazing', return_tensors='pt')
19
+ # inference
20
+ model(batch)
21
+ ```
22
+ ## Licensing Information
23
+ [Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License][cc-by-nc-sa].
24
+ [![CC BY-NC-SA 4.0][cc-by-nc-sa-image]][cc-by-nc-sa]
25
+ [cc-by-nc-sa]: http://creativecommons.org/licenses/by-nc-sa/4.0/
26
+ [cc-by-nc-sa-image]: https://i.creativecommons.org/l/by-nc-sa/4.0/88x31.png