rjavadi commited on
Commit
cb1c037
·
verified ·
1 Parent(s): eb2206f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +92 -0
README.md CHANGED
@@ -27,3 +27,95 @@ model-index:
27
  value: 0.5922
28
 
29
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
27
  value: 0.5922
28
 
29
  ---
30
+
31
+ # Model Card for Model ID
32
+
33
+ A fine-tuned DistilBERT model for Named Entity Recognition (NER) in bias detection
34
+
35
+ ## Model Details
36
+ We used `distilbert-base-uncased` and fine-tuned it on `vector-institute/NMB-Plus-Named-Entities` dataset.
37
+
38
+
39
+
40
+ ## How to Get Started with the Model
41
+
42
+ ```python
43
+ from transformers import AutoModelForTokenClassification, AutoTokenizer
44
+
45
+ model_name = "vector-institute/nmb-plus-bias-ner-bert"
46
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
47
+
48
+ label_list = ["O", "B-BIAS", "I-BIAS"]
49
+ id2label = {i: label for i, label in enumerate(label_list)}
50
+ label2id = {label: i for i, label in enumerate(label_list)}
51
+
52
+ model = AutoModelForTokenClassification.from_pretrained(
53
+ model_name,
54
+ id2label=id2label,
55
+ label2id=label2id
56
+ )
57
+
58
+
59
+ ner_pipeline = pipeline("ner", model=model, tokenizer=tokenizer)
60
+
61
+ text = "Fox News reported that Joe Biden met with CNN executives."
62
+ predictions = ner_pipeline(text)
63
+ print(predictions)
64
+
65
+ ```
66
+
67
+
68
+ ## Training Hyperparameters
69
+
70
+ - **Training regime:**
71
+ Here's the training arguments we used:
72
+
73
+ ```python
74
+ training_args = TrainingArguments(
75
+ learning_rate=2e-5,
76
+ per_device_train_batch_size=64,
77
+ per_device_eval_batch_size=32,
78
+ num_train_epochs=10,
79
+ weight_decay=0.01,
80
+ eval_strategy="epoch",
81
+ save_strategy="epoch",
82
+ load_best_model_at_end=True,
83
+ output_dir="./results",
84
+ logging_dir="./logs",
85
+ logging_steps=50,
86
+ group_by_length=True,
87
+ )
88
+ ```
89
+
90
+
91
+ ## Evaluation
92
+
93
+ We split the data to train(80%), validation(10%) and test(10%) sets.
94
+
95
+
96
+ ### Results
97
+ We used common classification metrics:
98
+ - precision
99
+ - recall
100
+ - f1-score
101
+
102
+ #### Overall Results:
103
+ | Metric | Precision | Recall | F1-Score | Support |
104
+ |---------------|-----------|--------|----------|---------|
105
+ | **Macro Avg** | 0.6405 | 0.5589 | 0.5922 | 48710 |
106
+ | **Weighted Avg** | 0.9330 | 0.9418 | 0.9366 | 48710 |
107
+
108
+ #### Per-class Results:
109
+
110
+ | Label | Precision | Recall | F1-Score | Support |
111
+ |----------|-----------|--------|----------|---------|
112
+ | **O** | 0.9615 | 0.9792 | 0.9703 | 45921 |
113
+ | **B-BIAS** | 0.5314 | 0.4183 | 0.4681 | 930 |
114
+ | **I-BIAS** | 0.4286 | 0.2792 | 0.3381 | 1859 |
115
+
116
+
117
+ ## Environmental Impact
118
+
119
+ Total energy consumption for fine-tuning is 0.032804 kWh
120
+
121
+ **Local CO2 Emission:** Approximately 3.12 grams of CO₂ equivalent.