BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Paper
•
1810.04805
•
Published
•
25
This model is a fine-tuned bert-base-uncased transformer for multi-label classification on the Jigsaw Toxic Comment Classification Challenge dataset.
It predicts multiple toxicity-related labels per comment, including:
bert-base-uncasedfrom transformers import AutoTokenizer, AutoModelForSequenceClassification
tokenizer = AutoTokenizer.from_pretrained("Koushim/bert-multilabel-jigsaw-toxic-classifier")
model = AutoModelForSequenceClassification.from_pretrained("Koushim/bert-multilabel-jigsaw-toxic-classifier")
text = "You are a wonderful person!"
inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True, max_length=128)
outputs = model(**inputs)
# Sigmoid to get probabilities for each label
import torch
probs = torch.sigmoid(outputs.logits)
print(probs)
| Index | Label |
|---|---|
| 0 | toxicity |
| 1 | severe_toxicity |
| 2 | obscene |
| 3 | threat |
| 4 | insult |
| 5 | identity_attack |
| 6 | sexual_explicit |
BertForSequenceClassification with problem_type="multi_label_classification")pytorch_model.bin - trained model weightsconfig.json - model configurationtokenizer.json, vocab.txt - tokenizer filesREADME.md - this fileYou can fine-tune this model using the Hugging Face Trainer API with your own dataset or the original Jigsaw dataset.
If you use this model in your research or project, please cite:
@article{devlin2019bert,
title={BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding},
author={Devlin, Jacob and Chang, Ming-Wei and Lee, Kenton and Toutanova, Kristina},
journal={arXiv preprint arXiv:1810.04805},
year={2019}
}
Apache 2.0 License