TTA-DQA commited on
Commit
05c9b64
·
verified ·
1 Parent(s): 53b5c48

Create readme-eng.md

Browse files
Files changed (1) hide show
  1. readme-eng.md +70 -0
readme-eng.md ADDED
@@ -0,0 +1,70 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Model Detail Information
2
+
3
+ ### 1. Overview
4
+
5
+ This model is trained to detect the presence of harmful expressions in Korean sentences.<br>
6
+ It performs multi-label classification to determine whether a given sentence contains hateful expressions.<br>
7
+ This model is designed for the AI task of 'multi-label text-classification', using the 'TTA-DQA/hate_sentence' dataset.<br>
8
+
9
+ The classification labels are:
10
+ - 0: 'insult'
11
+ - 1: 'abuse',
12
+ - 2: 'obscenity'
13
+ - 3: 'TVPC' #Threats of violence/promotion of crime
14
+ - 4: 'sexuality'
15
+ - 5: 'age'
16
+ - 6: 'race_region' #race and region
17
+ - 7: 'disabled'
18
+ - 8: 'religion'
19
+ - 9: 'politics'
20
+ - 10: 'job'
21
+ - 11:'no_hate'
22
+
23
+ ### 2. Training Information
24
+
25
+ - Base Model: KcElectra (a pre-trained Korean language model based on Electra)
26
+ - Source: beomi/KcELECTRA-base-v2022(https://huggingface.co/beomi/KcELECTRA-base-v2022)
27
+ - Model Type: Casual Language Model
28
+ - Pre-training (Korean): Approximately 17GB (over 180 million sentences)
29
+ - Fine-tuning (hate dataset): Approximately 28.9MB (TTA-DQA/hate_sentence)
30
+ - Learning Rate: 5e-6
31
+ - Weight Decay: 0.01
32
+ - Epochs: 30
33
+ - Batch Size: 16
34
+ - Data Loader Workers: 2
35
+ - Tokenizer: BertWordPieceTokenizer
36
+ - Model Size: Approximately 511MB
37
+
38
+ ### 3. Requirements
39
+
40
+ To use this model, ensure the following dependencies are installed:
41
+ - pytorch ~= 1.8.0
42
+ - transformers ~= 4.11.3
43
+ - emoji ~= 0.6.0
44
+ - soynlp ~= 0.0.493
45
+
46
+ ### 4. Quick Start
47
+
48
+ - python
49
+ ```python
50
+ from transformers import AutoTokenizer, AutoModel
51
+
52
+ tokenizer = AutoTokenizer.from_pretrained("TTA-DQA/Hate-Detection-MultiLabel-KcElectra-FineTuning")
53
+ model = AutoModel.from_pretrained("TTA-DQA/Hate-Detection-MultiLabel-KcElectra-FineTuning")
54
+
55
+ ```
56
+
57
+ ### 5. Citation
58
+
59
+ - This model was developed as part of the Quality Validation Project for Super-Giant AI Training Data (305-2100-2131, 2024 Quality Validation for Super-Giant AI Training).
60
+
61
+ ### 6. Bias, Risks, and Limitations
62
+
63
+ - The determination of harmful expressions may vary depending on language, culture, application context, and personal perspectives.
64
+ - Results may reflect biases or lead to controversy due to the subjective nature of evaluating harmful content.
65
+ - This model's outputs should not be considered as definitive standards for identifying harmful expressions.
66
+
67
+ # Results
68
+ - type : multi-label classification(text-classification)
69
+ - f1-score : 0.8279
70
+ - accuracy : 0.7013