Azerbaijani Toxicity Classifier

This is a multi-label text classification model fine-tuned to detect various types of toxicity in Azerbaijani text.

The model is based on mDeBERTa-v3 and can identify the following categories:

toxicity (Overall Toxicity)
severe_toxicity
obscene
threat
insult
identity_attack
sexual_explicit

Model Description

This model is designed for content moderation and analysis of online communication in the Azerbaijani language. It takes a string of text as input and returns a probability score for each of the seven toxicity categories. This allows for nuanced moderation, distinguishing between general insults, threats, or sexually explicit content.

How to Use

You can use this model directly with the transformers library.

First, make sure you have the necessary libraries installed:

pip install transformers torch

import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification

def classify_toxicity():
    text_to_classify = "Sən nə yaramaz adamsan"
    
    model_id = "LocalDoc/azerbaijani_toxicity_classifier"
    
    # The order of labels must match the model's output
    label_names = [
        'identity_attack',
        'insult',
        'obscene',
        'severe_toxicity',
        'sexual_explicit',
        'threat',
        'toxicity'
    ]
    
    print(f"Loading model: {model_id}...")
    try:
        tokenizer = AutoTokenizer.from_pretrained(model_id)
        model = AutoModelForSequenceClassification.from_pretrained(model_id)
    except Exception as e:
        print(f"Error loading model. Make sure the repository '{model_id}' is public and contains the model files.")
        print(f"Details: {e}")
        return

    device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
    model.to(device)
    model.eval()
    print(f"Model loaded successfully on {device}.")
-
    
    inputs = tokenizer(
        text_to_classify,
        truncation=True,
        padding=True,
        return_tensors='pt'
    ).to(device)

    
    with torch.no_grad():
        outputs = model(**inputs)
        # Apply sigmoid to get probabilities for each category
        probabilities = torch.sigmoid(outputs.logits).cpu().numpy()[0]


    threshold = 0.5
    overall_toxicity_score = probabilities[-1] # The last label is 'toxicity'
    is_toxic = overall_toxicity_score > threshold

    print("\n" + "="*50)
    print("TOXICITY ANALYSIS RESULTS")
    print("="*50)
    print(f"Text: {text_to_classify}")

    status = "TOXIC" if is_toxic else "NOT TOXIC"
    print(f"\nOverall Status: {status} (Confidence: {overall_toxicity_score:.3f})")

    print("\nCategory Scores:")
    for i, category in enumerate(label_names):
        score = probabilities[i]
        formatted_name = category.replace('_', ' ').capitalize()
        print(f"  - {formatted_name:<20}: {score:.4f}")
        
    print("="*50)


if __name__ == "__main__":
    classify_toxicity()

==================================================
TOXICITY ANALYSIS RESULTS
==================================================
Text: Sən nə yaramaz adamsan

Overall Status: TOXIC (Confidence: 0.987)

Category Scores:
  - Identity attack     : 0.0004
  - Insult              : 0.9878
  - Obscene             : 0.0056
  - Severe toxicity     : 0.0002
  - Sexual explicit     : 0.0002
  - Threat              : 0.0006
  - Toxicity            : 0.9875
==================================================

Intended Use & Limitations

This model is intended to be used as a tool for content moderation to help flag potentially harmful content for human review.

Limitations:

The model may struggle with sarcasm, irony, or other forms of nuanced language.
Its performance is dependent on the data it was trained on and may exhibit biases present in that data.
It should not be used to make fully automated, final decisions about content or users without a human-in-the-loop.

Training

The model was fine-tuned on a private dataset of Azerbaijani text labeled for the seven toxicity categories mentioned above. It was trained as a multi-label classification task, where each text can belong to one, multiple, or no categories.

License

This model is licensed under the Creative Commons Attribution 4.0 International (CC BY 4.0) license. You are free to share and adapt the material for any purpose, even commercially, as long as you give appropriate credit. For more details, see the license terms.

Contact

For more information, questions, or issues, please contact LocalDoc at [[email protected]].

LocalDoc
/

azerbaijani_toxicity_classifier