Now100
/

kmhas_electra_binary

Text Classification

binary-classification

Model card Files Files and versions Community

Now100 commited on 9 days ago

Commit

aee84c4

·

verified ·

1 Parent(s): d4755f8

Create README.md

Files changed (1) hide show

README.md +87 -0

README.md ADDED Viewed

	@@ -0,0 +1,87 @@

+---
+language: ko
+tags:
+  - hate-speech
+  - binary-classification
+  - electra
+  - korean
+  - transformers
+license: cc-by-4.0
+datasets:
+  - jeanlee/kmhas_korean_hate_speech
+model-index:
+  - name: kmhas_electra_binary
+    results:
+      - task:
+          name: Text Classification
+          type: text-classification
+        dataset:
+          name: KMHAS Korean Hate Speech
+          type: jeanlee/kmhas_korean_hate_speech
+        metrics:
+          - name: Accuracy
+            type: accuracy
+            value: 0.91
+          - name: F1
+            type: f1
+            value: 0.91
+          - name: Precision
+            type: precision
+            value: 0.91
+          - name: Recall
+            type: recall
+            value: 0.91
+---
+# KMHAS 한국어 혐오 발언 분류기 (이진 분류)
+한국어 문장에서 혐오 발언 여부를 분류하는 이진 텍스트 분류 모델.
+기반 모델: [`beomi/KcELECTRA-base-v2022`](https://huggingface.co/beomi/KcELECTRA-base-v2022)
+학습에는 [KMHAS 한국어 혐오 표현 데이터셋](https://huggingface.co/datasets/jeanlee/kmhas_korean_hate_speech) 사용
+---
+## 학습 정보
+- **Train Set**: 78,977개
+- **Validation Set**: 8,776개
+- **Test Set**: 21,939개
+- **Base Model**: `beomi/KcELECTRA-base-v2022`
+- **Epochs**: 5
+- **Batch Size**: 16 (train/eval)
+- **Evaluation Strategy**: 매 epoch마다 성능 평가
+- **Save Strategy**: 매 epoch마다 저장 (최대 1개 유지)
+---
+## 성능 평가 (Test Set 기준)
+| Metric     | Value |
+|------------|-------|
+| Accuracy   | 0.91  |
+| Precision  | 0.91  |
+| Recall     | 0.91  |
+| F1-score   | 0.91  |
+클래스별 성능:
+- **hate**: Precision 0.92 / Recall 0.91 / F1 0.92
+- **non-hate**: Precision 0.90 / Recall 0.91 / F1 0.90
+---
+## 사용 예시
+```python
+from transformers import AutoTokenizer, AutoModelForSequenceClassification
+model = AutoModelForSequenceClassification.from_pretrained("jeanlee/kmhas_electra_binary")
+tokenizer = AutoTokenizer.from_pretrained("jeanlee/kmhas_electra_binary")
+text = "전교조 개새끼들이 나라를 망치고 있다."
+inputs = tokenizer(text, return_tensors="pt")
+outputs = model(**inputs)
+label = outputs.logits.argmax(dim=1).item()
+print("예측 결과:", "non-hate" if label == 1 else "hate")
+```