Model Detail Information

1. Overview

This model is trained to detect the presence of harmful expressions in Korean sentences.
It performs binary classification to determine whether a given sentence contains hateful expressions or is a general, non-hateful sentence.
This model is designed for the AI task of 'text classification', using the 'TTA-DQA/hate_sentence' dataset.

The classification labels are:

"0": "no_hate"
"1": "hate"

2. Training Information

Base Model: KcElectra (a pre-trained Korean language model based on Electra)
Source: beomi/KcELECTRA-base-v2022(https://huggingface.co/beomi/KcELECTRA-base-v2022)
Model Type: Casual Language Model
Pre-training (Korean): Approximately 17GB (over 180 million sentences)
Fine-tuning (hate dataset): Approximately 22.3MB(TTA-DQA/hate_sentence)
Learning Rate: 5e-6
Weight Decay: 0.01
Epochs: 20
Batch Size: 16
Data Loader Workers: 2
Tokenizer: BertWordPieceTokenizer
Model Size: Approximately 512MB

3. Requirements

To use this model, ensure the following dependencies are installed:

pytorch ~= 1.8.0
transformers ~= 4.11.3
emoji ~= 0.6.0
soynlp ~= 0.0.493

4. Quick Start

python

from transformers import AutoTokenizer, AutoModel
  
tokenizer = AutoTokenizer.from_pretrained("TTA-DQA/HateDetection-KcElectra-FineTuning")
model = AutoModel.from_pretrained("TTA-DQA/HateDetection-KcElectra-FineTuning")

5. Citation

This model was developed as part of the Quality Validation Project for Super-Giant AI Training Data (305-2100-2131, 2024 Quality Validation for Super-Giant AI Training).

6. Bias, Risks, and Limitations

The determination of harmful expressions may vary depending on language, culture, application context, and personal perspectives.
Results may reflect biases or lead to controversy due to the subjective nature of evaluating harmful content.
This model's outputs should not be considered as definitive standards for identifying harmful expressions.

Results

type : binary classification(text-classification)
f1-score : 0.9928
accuracy : 0.9928