File size: 1,906 Bytes
179b9a7 a4bc426 5a64e80 179b9a7 a4bc426 179b9a7 a4bc426 179b9a7 a4bc426 179b9a7 a4bc426 a0077ee 179b9a7 a4bc426 179b9a7 a4bc426 179b9a7 a4bc426 179b9a7 a4bc426 179b9a7 a4bc426 179b9a7 a4bc426 179b9a7 a4bc426 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 |
---
library_name: transformers
language:
- en
- fr
- it
- es
- ru
- uk
- tt
- ar
- hi
- ja
- zh
- he
- am
- de
license: openrail++
datasets:
- textdetox/multilingual_toxicity_dataset
metrics:
- f1
base_model:
- google-bert/bert-base-multilingual-cased
pipeline_tag: text-classification
tags:
- toxic
---
## Multilingual Toxicity Classifier for 15 Languages (2025)
This is an instance of [bert-base-multilingual-cased](https://huggingface.co/google-bert/bert-base-multilingual-cased) that was fine-tuned on binary toxicity classification task based on our updated (2025) dataset [textdetox/multilingual_toxicity_dataset](https://huggingface.co/datasets/textdetox/multilingual_toxicity_dataset).
Now, the models covers 15 languages from various language families:
| Language | Code | F1 Score |
|-----------|------|---------|
| English | en | 0.9035 |
| Russian | ru | 0.9224 |
| Ukrainian | uk | 0.9461 |
| German | de | 0.5181 |
| Spanish | es | 0.7291 |
| Arabic | ar | 0.5139 |
| Amharic | am | 0.6316 |
| Hindi | hi | 0.7268 |
| Chinese | zh | 0.6703 |
| Italian | it | 0.6485 |
| French | fr | 0.9125 |
| Hinglish | hin | 0.6850 |
| Hebrew | he | 0.8686 |
| Japanese | ja | 0.8644 |
| Tatar | tt | 0.6170 |
## How to use
```python
import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification
tokenizer = AutoTokenizer.from_pretrained('textdetox/bert-multilingual-toxicity-classifier')
model = AutoModelForSequenceClassification.from_pretrained('textdetox/bert-multilingual-toxicity-classifier')
batch = tokenizer.encode("You are amazing!", return_tensors="pt")
output = model(batch)
# idx 0 for neutral, idx 1 for toxic
```
## Citation
The model is prepared for [TextDetox 2025 Shared Task](https://pan.webis.de/clef25/pan25-web/text-detoxification.html) evaluation.
Citation TBD soon. |