λͺ¨λΈ 상세 정보 (readme.md English Version)

1. κ°œμš”

이 λͺ¨λΈμ€ ν•œκ΅­μ–΄ λ¬Έμž₯ λ‚΄ μœ ν•΄ν‘œν˜„μ΄ ν¬ν•¨λ˜μ–΄μžˆλŠ”μ§€, 그리고 μœ ν•΄ν‘œν˜„μ˜ μœ ν˜•μ„ κ²€μΆœν•˜κΈ° μœ„ν•΄ ν•™μŠ΅λœ λͺ¨λΈμž…λ‹ˆλ‹€.
multi-label classification을 μˆ˜ν–‰ν•˜λ©°, μœ ν•΄ν‘œν˜„μ΄ ν¬ν•¨λ˜μ—ˆκ±°λ‚˜ 일반적인 λ¬Έμž₯인지 νŒλ‹¨(λΆ„λ₯˜)ν•˜λŠ” λͺ¨λΈμž…λ‹ˆλ‹€.
AI-Taskλ‘œλŠ” text-classification(multi-label)에 ν•΄λ‹Ήν•©λ‹ˆλ‹€. μ‚¬μš©ν•˜λŠ” 데이터셋은 TTA-DQA/hate_sentence μž…λ‹ˆλ‹€.

클래슀 ꡬ성은 μ•„λž˜μ™€ κ°™μŠ΅λ‹ˆλ‹€.

  • 0: 'insult'
  • 1: 'abuse',
  • 2: 'obscenity'
  • 3: 'TVPC' #Threats of violence/promotion of crime
  • 4: 'sexuality'
  • 5: 'age'
  • 6: 'race_region' #race and region
  • 7: 'disabled'
  • 8: 'religion'
  • 9: 'politics'
  • 10: 'job'
  • 11:'no_hate'

2. Training Information

  • Base Model: KcElectra (a pre-trained Korean language model based on Electra)
  • Source: beomi/KcELECTRA-base-v2022(https://huggingface.co/beomi/KcELECTRA-base-v2022)
  • Model Type: Casual Language Model
  • Pre-training (Korean): μ•½ 17GB (over 180 million sentences)
  • Fine-tuning (hate dataset): μ•½ 28.9MB (TTA-DQA/hate_sentence)
  • Learning Rate: 5e-6
  • Weight Decay: 0.01
  • Epochs: 30
  • Batch Size: 16
  • Data Loader Workers: 2
  • Tokenizer: BertWordPieceTokenizer
  • Model Size: Approximately 511MB

3. μš”κ΅¬μ‚¬ν•­

  • pytorch ~= 1.8.0
  • transformers ~= 4.11.3
  • emoji ~= 0.6.0
  • soynlp ~= 0.0.493

4. Quick Start

  • python
from transformers import AutoTokenizer, AutoModel
  
tokenizer = AutoTokenizer.from_pretrained("TTA-DQA/Hate-Detection-MultiLabel-KcElectra-FineTuning")
model = AutoModel.from_pretrained("TTA-DQA/Hate-Detection-MultiLabel-KcElectra-FineTuning")

5. Citation

  • 이 λͺ¨λΈμ€ μ΄ˆκ±°λŒ€AI ν•™μŠ΅μš© 데이터 ν’ˆμ§ˆκ²€μ¦ 사업(2024년도 μ΄ˆκ±°λŒ€AI ν•™μŠ΅μš© ν’ˆμ§ˆκ²€μ¦)에 μ˜ν•΄μ„œ κ΅¬μΆ•λ˜μ—ˆμŠ΅λ‹ˆλ‹€

6. 편ν–₯μ„±, μœ„ν—˜μ„±, μ œν•œμ„± λ“± ν‘œμ‹œ

  • λ³Έ λͺ¨λΈμ€ 각 클래슀 별 λ°μ΄ν„°μ˜ 양이 λ‹€μ†Œ 편ν–₯적인 뢀뢄이 μžˆμŠ΅λ‹ˆλ‹€.
  • λ˜ν•œ 클래슀 기쀀에 λŒ€ν•΄μ„œ, 언어적, 언어해석적 νŠΉμ„±μ— μ˜ν•΄ λ ˆμ΄λΈ”μ— λŒ€ν•œ 이견이 μžˆμ„ 수 μžˆμŠ΅λ‹ˆλ‹€.
  • μœ ν•΄ν‘œν˜„μ˜ 경우 μ–Έμ–΄, λ¬Έν™”, 적용 λΆ„μ•Ό, 개인적 견해에 따라 주관적인 뢀뢄이 μžˆμ–΄ 결과에 λŒ€ν•œ 편ν–₯ λ˜λŠ” λ…Όλž€μ΄ μžˆμ„ 수 μžˆμŠ΅λ‹ˆλ‹€.
  • λ”°λΌμ„œ, κ²°κ³Όκ°€ ν•œκ΅­μ–΄μ— λŒ€ν•œ μ ˆλŒ€μ μΈ μœ ν•΄ν‘œν˜„μ˜ 기쀀이 될 수 λŠ” μ—†μŠ΅λ‹ˆλ‹€.

μ‹€ν—˜κ²°κ³Ό

  • type : multi-label classification(text-classification)
  • f1-score : 0.8279
  • accuracy : 0.7013
Downloads last month
27
Safetensors
Model size
128M params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for TTA-DQA/Hate-Detection-MultiLabel-KcElectra-FineTuning

Finetuned
(6)
this model