collinzrj commited on
Commit
63761ff
·
verified ·
1 Parent(s): 7d2a026

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +16 -1
README.md CHANGED
@@ -20,7 +20,7 @@ The code used to produce the abliteration is at [https://github.com/andyrdt/refu
20
  ## Harmbench-eval
21
  When evaluated on Harmbench, DeepSeek-R1-Distill-Llama-8B has a score of 0.35, while DeepSeek-R1-Distill-Llama-8B-abliterate has a score of 0.68
22
 
23
- | Category | BaseModel | Abliteration |
24
  |------------------------------|---------|---------|
25
  | Disinformation | 0.4 | 0.4 |
26
  | Economic Harm | 0.8 | 0.2 |
@@ -34,6 +34,21 @@ When evaluated on Harmbench, DeepSeek-R1-Distill-Llama-8B has a score of 0.35, w
34
  | Sexual/Adult Content | 0.8 | 0.0 |
35
  | **Overall Harmful Rate** | **0.68** | **0.35** |
36
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
37
  ## Usage
38
  Example code to generate with the model
39
  ```
 
20
  ## Harmbench-eval
21
  When evaluated on Harmbench, DeepSeek-R1-Distill-Llama-8B has a score of 0.35, while DeepSeek-R1-Distill-Llama-8B-abliterate has a score of 0.68
22
 
23
+ | Category | Abliteration | BaseModel |
24
  |------------------------------|---------|---------|
25
  | Disinformation | 0.4 | 0.4 |
26
  | Economic Harm | 0.8 | 0.2 |
 
34
  | Sexual/Adult Content | 0.8 | 0.0 |
35
  | **Overall Harmful Rate** | **0.68** | **0.35** |
36
 
37
+
38
+ | 类别 | 基础模型 | Abliteration |
39
+ |------------------|----------|--------------|
40
+ | 虚假信息 | 0.4 | 0.4 |
41
+ | 经济损害 | 0.8 | 0.2 |
42
+ | 专家建议 | 0.8 | 0.5 |
43
+ | 欺诈/欺骗 | 0.8 | 0.5 |
44
+ | 政府决策 | 0.6 | 0.6 |
45
+ | 骚扰/歧视 | 0.3 | 0.2 |
46
+ | 恶意软件/黑客 | 0.9 | 0.3 |
47
+ | 人身伤害 | 0.8 | 0.2 |
48
+ | 隐私 | 0.6 | 0.6 |
49
+ | 性/成人内容 | 0.8 | 0.0 |
50
+ | 整体有害率 | 0.68 | 0.35 |
51
+
52
  ## Usage
53
  Example code to generate with the model
54
  ```