๋ชจ๋ธ ์ƒ์„ธ ์ •๋ณด (readme.md - English version)

1. ๊ฐœ์š”

์ด ๋ชจ๋ธ์€ ํ•œ๊ตญ์–ด ๋ฌธ์žฅ ๋‚ด ์œ ํ•ดํ‘œํ˜„์˜ ์œ ๋ฌด๋ฅผ ๊ฒ€์ถœํ•˜๊ธฐ ์œ„ํ•ด ํ•™์Šต๋œ ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค.
binary classification์„ ์ˆ˜ํ–‰ํ•˜๋ฉฐ, ์œ ํ•ดํ‘œํ˜„์ด ํฌํ•จ๋˜์—ˆ๊ฑฐ๋‚˜ ์ผ๋ฐ˜์ ์ธ ๋ฌธ์žฅ์ธ์ง€ ํŒ๋‹จ(๋ถ„๋ฅ˜)ํ•˜๋Š” ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค.
AI-Task๋กœ๋Š” text-classification์— ํ•ด๋‹นํ•ฉ๋‹ˆ๋‹ค. ์‚ฌ์šฉํ•˜๋Š” ๋ฐ์ดํ„ฐ์…‹์€ TTA-DQA/hate_sentence ์ž…๋‹ˆ๋‹ค.

ํด๋ž˜์Šค ๊ตฌ์„ฑ์€ ์•„๋ž˜์™€ ๊ฐ™์Šต๋‹ˆ๋‹ค.

  • "0": "no_hate"
  • "1": "hate"

2. ํ•™์Šต์ •๋ณด

  • Base Model: KcElectra (a pre-trained Korean language model based on Electra)
  • Source: beomi/KcELECTRA-base-v2022(https://huggingface.co/beomi/KcELECTRA-base-v2022)
  • Model Type: Casual Language Model
  • Pre-training (Korean): ์•ฝ 17GB (over 180 million sentences)
  • Fine-tuning (hate dataset): ์•ฝ 22.3MB(TTA-DQA/hate_sentence)
  • Learning Rate: 5e-6
  • Weight Decay: 0.01
  • Epochs: 20
  • Batch Size: 16
  • Data Loader Workers: 2
  • Tokenizer: BertWordPieceTokenizer
  • Model Size: Approximately 512MB

3. ์š”๊ตฌ์‚ฌํ•ญ

  • pytorch ~= 1.8.0
  • transformers ~= 4.11.3
  • emoji ~= 0.6.0
  • soynlp ~= 0.0.493

4. Quick Start

  • python
from transformers import AutoTokenizer, AutoModel
  
tokenizer = AutoTokenizer.from_pretrained("TTA-DQA/HateDetection-KcElectra-FineTuning")
model = AutoModel.from_pretrained("TTA-DQA/HateDetection-KcElectra-FineTuning")

5. Citation

  • ์ด ๋ชจ๋ธ์€ ์ดˆ๊ฑฐ๋Œ€AI ํ•™์Šต์šฉ ๋ฐ์ดํ„ฐ ํ’ˆ์งˆ๊ฒ€์ฆ ์‚ฌ์—…(2024๋…„๋„ ์ดˆ๊ฑฐ๋Œ€AI ํ•™์Šต์šฉ ํ’ˆ์งˆ๊ฒ€์ฆ)์— ์˜ํ•ด์„œ ๊ตฌ์ถ•๋˜์—ˆ์Šต๋‹ˆ๋‹ค

6. ํ•œ๊ณ„์„ฑ, ์œ„ํ—˜์„ฑ, ํŽธ์„ฑ ๋“ฑ ๋ช…์‹œ

  • ๋ณธ ๋ชจ๋ธ์€ ๊ฐ ํด๋ž˜์Šค์˜ ๋ฐ์ดํ„ฐ๋ฅผ ํŽธํ–ฅ๋˜๊ฒŒ ํ•™์Šตํ•˜์ง€๋Š” ์•Š์•˜์œผ๋‚˜ ์–ธ์–ด์ , ์–ธ์–ดํ•ด์„์  ํŠน์„ฑ์— ์˜ํ•ด ๋ ˆ์ด๋ธ”์— ๋Œ€ํ•œ ์ด๊ฒฌ์ด ์žˆ์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
  • ์œ ํ•ดํ‘œํ˜„์˜ ๊ฒฝ์šฐ ์–ธ์–ด, ๋ฌธํ™”, ์ ์šฉ ๋ถ„์•ผ, ๊ฐœ์ธ์  ๊ฒฌํ•ด์— ๋”ฐ๋ผ ์ฃผ๊ด€์ ์ธ ๋ถ€๋ถ„์ด ์žˆ์–ด ๊ฒฐ๊ณผ์— ๋Œ€ํ•œ ํŽธํ–ฅ ๋˜๋Š” ๋…ผ๋ž€์ด ์žˆ์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
  • ๋”ฐ๋ผ์„œ, ๊ฒฐ๊ณผ๊ฐ€ ํ•œ๊ตญ์–ด์— ๋Œ€ํ•œ ์ ˆ๋Œ€์ ์ธ ์œ ํ•ดํ‘œํ˜„์˜ ๊ธฐ์ค€์ด ๋  ์ˆ˜ ๋Š” ์—†์Šต๋‹ˆ๋‹ค.

๋ชจ๋ธ ์„ฑ๋Šฅ ๊ฒฐ๊ณผ

  • ๋ถ„๋ฅ˜ ์œ ํ˜• : binary classification(text-classification)
  • f1-score : 0.9928
  • accuracy : 0.9928
Downloads last month
325
Safetensors
Model size
128M params
Tensor type
F32
ยท
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.

Model tree for TTA-DQA/HateDetection-KcElectra-FineTuning

Finetuned
(6)
this model