Xiaojian9992024 commited on
Commit
d2b9510
·
verified ·
1 Parent(s): dd03a9d

Adding Evaluation Results (#1)

Browse files

- Adding Evaluation Results (8148f7a3bb35f9fa20f3ce4bc2215f96c67e5eba)

Files changed (1) hide show
  1. README.md +115 -7
README.md CHANGED
@@ -1,12 +1,6 @@
1
  ---
2
  language:
3
  - en
4
- base_model:
5
- - Qwen/Qwen2.5-7B-Instruct
6
- - fblgit/cybertron-v4-qw7B-MGS
7
- - huihui-ai/Qwen2.5-7B-Instruct-abliterated-v3
8
- - FreedomIntelligence/HuatuoGPT-o1-7B
9
- - rombodawg/Rombos-LLM-V2.5-Qwen-7b
10
  tags:
11
  - mergekit
12
  - merge
@@ -17,6 +11,107 @@ tags:
17
  - boolean-expression-champion
18
  - math-avoider
19
  - object-counting-struggler
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
20
  ---
21
  base_model:
22
  - fblgit/cybertron-v4-qw7B-MGS
@@ -139,4 +234,17 @@ models:
139
      parameters:
140
        weight: 0.05 # Medical intelligence? Maybe?
141
        density: 0.05 # Homeopathic dose of medical knowledge
142
- ´´´
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  language:
3
  - en
 
 
 
 
 
 
4
  tags:
5
  - mergekit
6
  - merge
 
11
  - boolean-expression-champion
12
  - math-avoider
13
  - object-counting-struggler
14
+ base_model:
15
+ - Qwen/Qwen2.5-7B-Instruct
16
+ - fblgit/cybertron-v4-qw7B-MGS
17
+ - huihui-ai/Qwen2.5-7B-Instruct-abliterated-v3
18
+ - FreedomIntelligence/HuatuoGPT-o1-7B
19
+ - rombodawg/Rombos-LLM-V2.5-Qwen-7b
20
+ model-index:
21
+ - name: Qwen2.5-THREADRIPPER-Small
22
+ results:
23
+ - task:
24
+ type: text-generation
25
+ name: Text Generation
26
+ dataset:
27
+ name: IFEval (0-Shot)
28
+ type: HuggingFaceH4/ifeval
29
+ args:
30
+ num_few_shot: 0
31
+ metrics:
32
+ - type: inst_level_strict_acc and prompt_level_strict_acc
33
+ value: 76.89
34
+ name: strict accuracy
35
+ source:
36
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Xiaojian9992024/Qwen2.5-THREADRIPPER-Small
37
+ name: Open LLM Leaderboard
38
+ - task:
39
+ type: text-generation
40
+ name: Text Generation
41
+ dataset:
42
+ name: BBH (3-Shot)
43
+ type: BBH
44
+ args:
45
+ num_few_shot: 3
46
+ metrics:
47
+ - type: acc_norm
48
+ value: 35.79
49
+ name: normalized accuracy
50
+ source:
51
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Xiaojian9992024/Qwen2.5-THREADRIPPER-Small
52
+ name: Open LLM Leaderboard
53
+ - task:
54
+ type: text-generation
55
+ name: Text Generation
56
+ dataset:
57
+ name: MATH Lvl 5 (4-Shot)
58
+ type: hendrycks/competition_math
59
+ args:
60
+ num_few_shot: 4
61
+ metrics:
62
+ - type: exact_match
63
+ value: 47.36
64
+ name: exact match
65
+ source:
66
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Xiaojian9992024/Qwen2.5-THREADRIPPER-Small
67
+ name: Open LLM Leaderboard
68
+ - task:
69
+ type: text-generation
70
+ name: Text Generation
71
+ dataset:
72
+ name: GPQA (0-shot)
73
+ type: Idavidrein/gpqa
74
+ args:
75
+ num_few_shot: 0
76
+ metrics:
77
+ - type: acc_norm
78
+ value: 8.05
79
+ name: acc_norm
80
+ source:
81
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Xiaojian9992024/Qwen2.5-THREADRIPPER-Small
82
+ name: Open LLM Leaderboard
83
+ - task:
84
+ type: text-generation
85
+ name: Text Generation
86
+ dataset:
87
+ name: MuSR (0-shot)
88
+ type: TAUR-Lab/MuSR
89
+ args:
90
+ num_few_shot: 0
91
+ metrics:
92
+ - type: acc_norm
93
+ value: 13.93
94
+ name: acc_norm
95
+ source:
96
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Xiaojian9992024/Qwen2.5-THREADRIPPER-Small
97
+ name: Open LLM Leaderboard
98
+ - task:
99
+ type: text-generation
100
+ name: Text Generation
101
+ dataset:
102
+ name: MMLU-PRO (5-shot)
103
+ type: TIGER-Lab/MMLU-Pro
104
+ config: main
105
+ split: test
106
+ args:
107
+ num_few_shot: 5
108
+ metrics:
109
+ - type: acc
110
+ value: 37.3
111
+ name: accuracy
112
+ source:
113
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Xiaojian9992024/Qwen2.5-THREADRIPPER-Small
114
+ name: Open LLM Leaderboard
115
  ---
116
  base_model:
117
  - fblgit/cybertron-v4-qw7B-MGS
 
234
      parameters:
235
        weight: 0.05 # Medical intelligence? Maybe?
236
        density: 0.05 # Homeopathic dose of medical knowledge
237
+ ´´´
238
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
239
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/Xiaojian9992024__Qwen2.5-THREADRIPPER-Small-details)
240
+
241
+ | Metric |Value|
242
+ |-------------------|----:|
243
+ |Avg. |36.55|
244
+ |IFEval (0-Shot) |76.89|
245
+ |BBH (3-Shot) |35.79|
246
+ |MATH Lvl 5 (4-Shot)|47.36|
247
+ |GPQA (0-shot) | 8.05|
248
+ |MuSR (0-shot) |13.93|
249
+ |MMLU-PRO (5-shot) |37.30|
250
+