TTA-DQA's picture
Update README.md
2a5373f verified
---
license: mit
datasets:
- TTA-DQA/hate_sentence
language:
- ko
metrics:
- accuracy
base_model:
- beomi/KcELECTRA-base-v2022
tags:
- Text-Classification
- Hate-Detection
- Hate-Senetence-Detection
---
# ๋ชจ๋ธ ์ƒ์„ธ ์ •๋ณด ([readme.md - English version](https://huggingface.co/TTA-DQA/HateDetection-KcElectra-FineTuning/blob/main/readme-eng.md))
### 1. ๊ฐœ์š”
์ด ๋ชจ๋ธ์€ ํ•œ๊ตญ์–ด ๋ฌธ์žฅ ๋‚ด ์œ ํ•ดํ‘œํ˜„์˜ ์œ ๋ฌด๋ฅผ ๊ฒ€์ถœํ•˜๊ธฐ ์œ„ํ•ด ํ•™์Šต๋œ ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค. <br>
binary classification์„ ์ˆ˜ํ–‰ํ•˜๋ฉฐ, ์œ ํ•ดํ‘œํ˜„์ด ํฌํ•จ๋˜์—ˆ๊ฑฐ๋‚˜ ์ผ๋ฐ˜์ ์ธ ๋ฌธ์žฅ์ธ์ง€ ํŒ๋‹จ(๋ถ„๋ฅ˜)ํ•˜๋Š” ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค. <br>
AI-Task๋กœ๋Š” text-classification์— ํ•ด๋‹นํ•ฉ๋‹ˆ๋‹ค. ์‚ฌ์šฉํ•˜๋Š” ๋ฐ์ดํ„ฐ์…‹์€ TTA-DQA/hate_sentence ์ž…๋‹ˆ๋‹ค. <br>
ํด๋ž˜์Šค ๊ตฌ์„ฑ์€ ์•„๋ž˜์™€ ๊ฐ™์Šต๋‹ˆ๋‹ค.
- "0": "no_hate"
- "1": "hate"
### 2. ํ•™์Šต์ •๋ณด
- Base Model: KcElectra (a pre-trained Korean language model based on Electra)
- Source: beomi/KcELECTRA-base-v2022(https://huggingface.co/beomi/KcELECTRA-base-v2022)
- Model Type: Casual Language Model
- Pre-training (Korean): ์•ฝ 17GB (over 180 million sentences)
- Fine-tuning (hate dataset): ์•ฝ 22.3MB(TTA-DQA/hate_sentence)
- Learning Rate: 5e-6
- Weight Decay: 0.01
- Epochs: 20
- Batch Size: 16
- Data Loader Workers: 2
- Tokenizer: BertWordPieceTokenizer
- Model Size: Approximately 512MB
### 3. ์š”๊ตฌ์‚ฌํ•ญ
- pytorch ~= 1.8.0
- transformers ~= 4.11.3
- emoji ~= 0.6.0
- soynlp ~= 0.0.493
### 4. Quick Start
- python
```python
from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("TTA-DQA/HateDetection-KcElectra-FineTuning")
model = AutoModel.from_pretrained("TTA-DQA/HateDetection-KcElectra-FineTuning")
```
### 5. Citation
- ์ด ๋ชจ๋ธ์€ ์ดˆ๊ฑฐ๋Œ€AI ํ•™์Šต์šฉ ๋ฐ์ดํ„ฐ ํ’ˆ์งˆ๊ฒ€์ฆ ์‚ฌ์—…(2024๋…„๋„ ์ดˆ๊ฑฐ๋Œ€AI ํ•™์Šต์šฉ ํ’ˆ์งˆ๊ฒ€์ฆ)์— ์˜ํ•ด์„œ ๊ตฌ์ถ•๋˜์—ˆ์Šต๋‹ˆ๋‹ค
### 6. ํ•œ๊ณ„์„ฑ, ์œ„ํ—˜์„ฑ, ํŽธ์„ฑ ๋“ฑ ๋ช…์‹œ
- ๋ณธ ๋ชจ๋ธ์€ ๊ฐ ํด๋ž˜์Šค์˜ ๋ฐ์ดํ„ฐ๋ฅผ ํŽธํ–ฅ๋˜๊ฒŒ ํ•™์Šตํ•˜์ง€๋Š” ์•Š์•˜์œผ๋‚˜ ์–ธ์–ด์ , ์–ธ์–ดํ•ด์„์  ํŠน์„ฑ์— ์˜ํ•ด ๋ ˆ์ด๋ธ”์— ๋Œ€ํ•œ ์ด๊ฒฌ์ด ์žˆ์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
- ์œ ํ•ดํ‘œํ˜„์˜ ๊ฒฝ์šฐ ์–ธ์–ด, ๋ฌธํ™”, ์ ์šฉ ๋ถ„์•ผ, ๊ฐœ์ธ์  ๊ฒฌํ•ด์— ๋”ฐ๋ผ ์ฃผ๊ด€์ ์ธ ๋ถ€๋ถ„์ด ์žˆ์–ด ๊ฒฐ๊ณผ์— ๋Œ€ํ•œ ํŽธํ–ฅ ๋˜๋Š” ๋…ผ๋ž€์ด ์žˆ์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
- ๋”ฐ๋ผ์„œ, ๊ฒฐ๊ณผ๊ฐ€ ํ•œ๊ตญ์–ด์— ๋Œ€ํ•œ ์ ˆ๋Œ€์ ์ธ ์œ ํ•ดํ‘œํ˜„์˜ ๊ธฐ์ค€์ด ๋  ์ˆ˜ ๋Š” ์—†์Šต๋‹ˆ๋‹ค.
# ๋ชจ๋ธ ์„ฑ๋Šฅ ๊ฒฐ๊ณผ
- ๋ถ„๋ฅ˜ ์œ ํ˜• : binary classification(text-classification)
- f1-score : 0.9928
- accuracy : 0.9928