Novaciano commited on
Commit
ad7f283
·
verified ·
1 Parent(s): 0214564

Adding Evaluation Results (#1)

Browse files

- Adding Evaluation Results (49f0ac9efaf2bae1d155727acd4ce10e81833193)

Files changed (1) hide show
  1. README.md +114 -1
README.md CHANGED
@@ -18,6 +18,105 @@ tags:
18
  language:
19
  - en
20
  - es
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
21
  ---
22
  ## merge
23
 
@@ -100,4 +199,18 @@ base_model: Novaciano/BAPHOMET
100
  dtype: bfloat16
101
  parameters:
102
  t: [0, 0.5, 1, 0.5, 0]
103
- ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
18
  language:
19
  - en
20
  - es
21
+ model-index:
22
+ - name: Sigil-Of-Satan-3.2-1B
23
+ results:
24
+ - task:
25
+ type: text-generation
26
+ name: Text Generation
27
+ dataset:
28
+ name: IFEval (0-Shot)
29
+ type: wis-k/instruction-following-eval
30
+ split: train
31
+ args:
32
+ num_few_shot: 0
33
+ metrics:
34
+ - type: inst_level_strict_acc and prompt_level_strict_acc
35
+ value: 54.94
36
+ name: averaged accuracy
37
+ source:
38
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=Novaciano%2FSigil-Of-Satan-3.2-1B
39
+ name: Open LLM Leaderboard
40
+ - task:
41
+ type: text-generation
42
+ name: Text Generation
43
+ dataset:
44
+ name: BBH (3-Shot)
45
+ type: SaylorTwift/bbh
46
+ split: test
47
+ args:
48
+ num_few_shot: 3
49
+ metrics:
50
+ - type: acc_norm
51
+ value: 9.4
52
+ name: normalized accuracy
53
+ source:
54
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=Novaciano%2FSigil-Of-Satan-3.2-1B
55
+ name: Open LLM Leaderboard
56
+ - task:
57
+ type: text-generation
58
+ name: Text Generation
59
+ dataset:
60
+ name: MATH Lvl 5 (4-Shot)
61
+ type: lighteval/MATH-Hard
62
+ split: test
63
+ args:
64
+ num_few_shot: 4
65
+ metrics:
66
+ - type: exact_match
67
+ value: 5.44
68
+ name: exact match
69
+ source:
70
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=Novaciano%2FSigil-Of-Satan-3.2-1B
71
+ name: Open LLM Leaderboard
72
+ - task:
73
+ type: text-generation
74
+ name: Text Generation
75
+ dataset:
76
+ name: GPQA (0-shot)
77
+ type: Idavidrein/gpqa
78
+ split: train
79
+ args:
80
+ num_few_shot: 0
81
+ metrics:
82
+ - type: acc_norm
83
+ value: 1.45
84
+ name: acc_norm
85
+ source:
86
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=Novaciano%2FSigil-Of-Satan-3.2-1B
87
+ name: Open LLM Leaderboard
88
+ - task:
89
+ type: text-generation
90
+ name: Text Generation
91
+ dataset:
92
+ name: MuSR (0-shot)
93
+ type: TAUR-Lab/MuSR
94
+ args:
95
+ num_few_shot: 0
96
+ metrics:
97
+ - type: acc_norm
98
+ value: 1.42
99
+ name: acc_norm
100
+ source:
101
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=Novaciano%2FSigil-Of-Satan-3.2-1B
102
+ name: Open LLM Leaderboard
103
+ - task:
104
+ type: text-generation
105
+ name: Text Generation
106
+ dataset:
107
+ name: MMLU-PRO (5-shot)
108
+ type: TIGER-Lab/MMLU-Pro
109
+ config: main
110
+ split: test
111
+ args:
112
+ num_few_shot: 5
113
+ metrics:
114
+ - type: acc
115
+ value: 9.5
116
+ name: accuracy
117
+ source:
118
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=Novaciano%2FSigil-Of-Satan-3.2-1B
119
+ name: Open LLM Leaderboard
120
  ---
121
  ## merge
122
 
 
199
  dtype: bfloat16
200
  parameters:
201
  t: [0, 0.5, 1, 0.5, 0]
202
+ ```
203
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
204
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/Novaciano__Sigil-Of-Satan-3.2-1B-details)!
205
+ Summarized results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/contents/viewer/default/train?q=Novaciano%2FSigil-Of-Satan-3.2-1B&sort[column]=Average%20%E2%AC%86%EF%B8%8F&sort[direction]=desc)!
206
+
207
+ | Metric |Value (%)|
208
+ |-------------------|--------:|
209
+ |**Average** | 13.69|
210
+ |IFEval (0-Shot) | 54.94|
211
+ |BBH (3-Shot) | 9.40|
212
+ |MATH Lvl 5 (4-Shot)| 5.44|
213
+ |GPQA (0-shot) | 1.45|
214
+ |MuSR (0-shot) | 1.42|
215
+ |MMLU-PRO (5-shot) | 9.50|
216
+