lewtun HF staff commited on
Commit
894bb01
·
verified ·
1 Parent(s): 90372c2

Upload eval_results/alignment-handbook/zephyr-2b-gemma-dpo-v2.3/main/mmlu/results_2024-03-05T12-20-09.376521.json with huggingface_hub

Browse files
eval_results/alignment-handbook/zephyr-2b-gemma-dpo-v2.3/main/mmlu/results_2024-03-05T12-20-09.376521.json ADDED
@@ -0,0 +1,2949 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "config_general": {
3
+ "lighteval_sha": "?",
4
+ "num_fewshot_seeds": 1,
5
+ "override_batch_size": 1,
6
+ "max_samples": null,
7
+ "job_id": "",
8
+ "start_time": 1901441.684836846,
9
+ "end_time": 1902184.087957652,
10
+ "total_evaluation_time_secondes": "742.403120805975",
11
+ "model_name": "alignment-handbook/zephyr-2b-gemma-dpo-v2.3",
12
+ "model_sha": "05de867ca01a6e052ba25674971a5520f440d22e",
13
+ "model_dtype": "torch.bfloat16",
14
+ "model_size": "4.68 GB",
15
+ "config": null
16
+ },
17
+ "results": {
18
+ "lighteval|mmlu:abstract_algebra|5": {
19
+ "acc": 0.28,
20
+ "acc_stderr": 0.04512608598542129
21
+ },
22
+ "lighteval|mmlu:anatomy|5": {
23
+ "acc": 0.45925925925925926,
24
+ "acc_stderr": 0.04304979692464243
25
+ },
26
+ "lighteval|mmlu:astronomy|5": {
27
+ "acc": 0.3618421052631579,
28
+ "acc_stderr": 0.03910525752849724
29
+ },
30
+ "lighteval|mmlu:business_ethics|5": {
31
+ "acc": 0.31,
32
+ "acc_stderr": 0.04648231987117316
33
+ },
34
+ "lighteval|mmlu:clinical_knowledge|5": {
35
+ "acc": 0.4641509433962264,
36
+ "acc_stderr": 0.030693675018458003
37
+ },
38
+ "lighteval|mmlu:college_biology|5": {
39
+ "acc": 0.4513888888888889,
40
+ "acc_stderr": 0.04161402398403279
41
+ },
42
+ "lighteval|mmlu:college_chemistry|5": {
43
+ "acc": 0.28,
44
+ "acc_stderr": 0.04512608598542126
45
+ },
46
+ "lighteval|mmlu:college_computer_science|5": {
47
+ "acc": 0.3,
48
+ "acc_stderr": 0.046056618647183814
49
+ },
50
+ "lighteval|mmlu:college_mathematics|5": {
51
+ "acc": 0.31,
52
+ "acc_stderr": 0.04648231987117316
53
+ },
54
+ "lighteval|mmlu:college_medicine|5": {
55
+ "acc": 0.3930635838150289,
56
+ "acc_stderr": 0.0372424959581773
57
+ },
58
+ "lighteval|mmlu:college_physics|5": {
59
+ "acc": 0.14705882352941177,
60
+ "acc_stderr": 0.035240689515674495
61
+ },
62
+ "lighteval|mmlu:computer_security|5": {
63
+ "acc": 0.52,
64
+ "acc_stderr": 0.050211673156867795
65
+ },
66
+ "lighteval|mmlu:conceptual_physics|5": {
67
+ "acc": 0.3404255319148936,
68
+ "acc_stderr": 0.030976692998534422
69
+ },
70
+ "lighteval|mmlu:econometrics|5": {
71
+ "acc": 0.2982456140350877,
72
+ "acc_stderr": 0.04303684033537314
73
+ },
74
+ "lighteval|mmlu:electrical_engineering|5": {
75
+ "acc": 0.4413793103448276,
76
+ "acc_stderr": 0.04137931034482758
77
+ },
78
+ "lighteval|mmlu:elementary_mathematics|5": {
79
+ "acc": 0.2857142857142857,
80
+ "acc_stderr": 0.023266512213730578
81
+ },
82
+ "lighteval|mmlu:formal_logic|5": {
83
+ "acc": 0.31746031746031744,
84
+ "acc_stderr": 0.04163453031302859
85
+ },
86
+ "lighteval|mmlu:global_facts|5": {
87
+ "acc": 0.3,
88
+ "acc_stderr": 0.046056618647183814
89
+ },
90
+ "lighteval|mmlu:high_school_biology|5": {
91
+ "acc": 0.38064516129032255,
92
+ "acc_stderr": 0.027621717832907046
93
+ },
94
+ "lighteval|mmlu:high_school_chemistry|5": {
95
+ "acc": 0.33497536945812806,
96
+ "acc_stderr": 0.033208527423483104
97
+ },
98
+ "lighteval|mmlu:high_school_computer_science|5": {
99
+ "acc": 0.42,
100
+ "acc_stderr": 0.049604496374885836
101
+ },
102
+ "lighteval|mmlu:high_school_european_history|5": {
103
+ "acc": 0.3878787878787879,
104
+ "acc_stderr": 0.0380491365397101
105
+ },
106
+ "lighteval|mmlu:high_school_geography|5": {
107
+ "acc": 0.51010101010101,
108
+ "acc_stderr": 0.035616254886737454
109
+ },
110
+ "lighteval|mmlu:high_school_government_and_politics|5": {
111
+ "acc": 0.46632124352331605,
112
+ "acc_stderr": 0.036002440698671784
113
+ },
114
+ "lighteval|mmlu:high_school_macroeconomics|5": {
115
+ "acc": 0.38974358974358975,
116
+ "acc_stderr": 0.024726967886647078
117
+ },
118
+ "lighteval|mmlu:high_school_mathematics|5": {
119
+ "acc": 0.26296296296296295,
120
+ "acc_stderr": 0.026842057873833706
121
+ },
122
+ "lighteval|mmlu:high_school_microeconomics|5": {
123
+ "acc": 0.3739495798319328,
124
+ "acc_stderr": 0.031429466378837076
125
+ },
126
+ "lighteval|mmlu:high_school_physics|5": {
127
+ "acc": 0.33774834437086093,
128
+ "acc_stderr": 0.038615575462551684
129
+ },
130
+ "lighteval|mmlu:high_school_psychology|5": {
131
+ "acc": 0.5192660550458715,
132
+ "acc_stderr": 0.02142140298254888
133
+ },
134
+ "lighteval|mmlu:high_school_statistics|5": {
135
+ "acc": 0.26851851851851855,
136
+ "acc_stderr": 0.0302252261600124
137
+ },
138
+ "lighteval|mmlu:high_school_us_history|5": {
139
+ "acc": 0.3137254901960784,
140
+ "acc_stderr": 0.03256685484460388
141
+ },
142
+ "lighteval|mmlu:high_school_world_history|5": {
143
+ "acc": 0.3333333333333333,
144
+ "acc_stderr": 0.030685820596610805
145
+ },
146
+ "lighteval|mmlu:human_aging|5": {
147
+ "acc": 0.36771300448430494,
148
+ "acc_stderr": 0.03236198350928275
149
+ },
150
+ "lighteval|mmlu:human_sexuality|5": {
151
+ "acc": 0.4732824427480916,
152
+ "acc_stderr": 0.04379024936553893
153
+ },
154
+ "lighteval|mmlu:international_law|5": {
155
+ "acc": 0.5867768595041323,
156
+ "acc_stderr": 0.04495087843548408
157
+ },
158
+ "lighteval|mmlu:jurisprudence|5": {
159
+ "acc": 0.39814814814814814,
160
+ "acc_stderr": 0.04732332615978815
161
+ },
162
+ "lighteval|mmlu:logical_fallacies|5": {
163
+ "acc": 0.3987730061349693,
164
+ "acc_stderr": 0.03847021420456023
165
+ },
166
+ "lighteval|mmlu:machine_learning|5": {
167
+ "acc": 0.32142857142857145,
168
+ "acc_stderr": 0.04432804055291519
169
+ },
170
+ "lighteval|mmlu:management|5": {
171
+ "acc": 0.5436893203883495,
172
+ "acc_stderr": 0.049318019942204146
173
+ },
174
+ "lighteval|mmlu:marketing|5": {
175
+ "acc": 0.5897435897435898,
176
+ "acc_stderr": 0.03222414045241108
177
+ },
178
+ "lighteval|mmlu:medical_genetics|5": {
179
+ "acc": 0.43,
180
+ "acc_stderr": 0.049756985195624284
181
+ },
182
+ "lighteval|mmlu:miscellaneous|5": {
183
+ "acc": 0.5376756066411239,
184
+ "acc_stderr": 0.017829131764287187
185
+ },
186
+ "lighteval|mmlu:moral_disputes|5": {
187
+ "acc": 0.41329479768786126,
188
+ "acc_stderr": 0.026511261369409237
189
+ },
190
+ "lighteval|mmlu:moral_scenarios|5": {
191
+ "acc": 0.24804469273743016,
192
+ "acc_stderr": 0.014444157808261436
193
+ },
194
+ "lighteval|mmlu:nutrition|5": {
195
+ "acc": 0.4869281045751634,
196
+ "acc_stderr": 0.028620130800700246
197
+ },
198
+ "lighteval|mmlu:philosophy|5": {
199
+ "acc": 0.43729903536977494,
200
+ "acc_stderr": 0.02817391776176287
201
+ },
202
+ "lighteval|mmlu:prehistory|5": {
203
+ "acc": 0.43209876543209874,
204
+ "acc_stderr": 0.02756301097160667
205
+ },
206
+ "lighteval|mmlu:professional_accounting|5": {
207
+ "acc": 0.2872340425531915,
208
+ "acc_stderr": 0.026992199173064356
209
+ },
210
+ "lighteval|mmlu:professional_law|5": {
211
+ "acc": 0.3070404172099087,
212
+ "acc_stderr": 0.011780959114513769
213
+ },
214
+ "lighteval|mmlu:professional_medicine|5": {
215
+ "acc": 0.29411764705882354,
216
+ "acc_stderr": 0.027678468642144707
217
+ },
218
+ "lighteval|mmlu:professional_psychology|5": {
219
+ "acc": 0.3660130718954248,
220
+ "acc_stderr": 0.01948802574552967
221
+ },
222
+ "lighteval|mmlu:public_relations|5": {
223
+ "acc": 0.35454545454545455,
224
+ "acc_stderr": 0.04582004841505416
225
+ },
226
+ "lighteval|mmlu:security_studies|5": {
227
+ "acc": 0.46938775510204084,
228
+ "acc_stderr": 0.031949171367580624
229
+ },
230
+ "lighteval|mmlu:sociology|5": {
231
+ "acc": 0.5074626865671642,
232
+ "acc_stderr": 0.03535140084276719
233
+ },
234
+ "lighteval|mmlu:us_foreign_policy|5": {
235
+ "acc": 0.57,
236
+ "acc_stderr": 0.04975698519562428
237
+ },
238
+ "lighteval|mmlu:virology|5": {
239
+ "acc": 0.41566265060240964,
240
+ "acc_stderr": 0.03836722176598053
241
+ },
242
+ "lighteval|mmlu:world_religions|5": {
243
+ "acc": 0.5789473684210527,
244
+ "acc_stderr": 0.037867207062342145
245
+ },
246
+ "lighteval|mmlu:_average|5": {
247
+ "acc": 0.3925344762957048,
248
+ "acc_stderr": 0.03579148471683999
249
+ }
250
+ },
251
+ "versions": {
252
+ "lighteval|mmlu:abstract_algebra|5": 0,
253
+ "lighteval|mmlu:anatomy|5": 0,
254
+ "lighteval|mmlu:astronomy|5": 0,
255
+ "lighteval|mmlu:business_ethics|5": 0,
256
+ "lighteval|mmlu:clinical_knowledge|5": 0,
257
+ "lighteval|mmlu:college_biology|5": 0,
258
+ "lighteval|mmlu:college_chemistry|5": 0,
259
+ "lighteval|mmlu:college_computer_science|5": 0,
260
+ "lighteval|mmlu:college_mathematics|5": 0,
261
+ "lighteval|mmlu:college_medicine|5": 0,
262
+ "lighteval|mmlu:college_physics|5": 0,
263
+ "lighteval|mmlu:computer_security|5": 0,
264
+ "lighteval|mmlu:conceptual_physics|5": 0,
265
+ "lighteval|mmlu:econometrics|5": 0,
266
+ "lighteval|mmlu:electrical_engineering|5": 0,
267
+ "lighteval|mmlu:elementary_mathematics|5": 0,
268
+ "lighteval|mmlu:formal_logic|5": 0,
269
+ "lighteval|mmlu:global_facts|5": 0,
270
+ "lighteval|mmlu:high_school_biology|5": 0,
271
+ "lighteval|mmlu:high_school_chemistry|5": 0,
272
+ "lighteval|mmlu:high_school_computer_science|5": 0,
273
+ "lighteval|mmlu:high_school_european_history|5": 0,
274
+ "lighteval|mmlu:high_school_geography|5": 0,
275
+ "lighteval|mmlu:high_school_government_and_politics|5": 0,
276
+ "lighteval|mmlu:high_school_macroeconomics|5": 0,
277
+ "lighteval|mmlu:high_school_mathematics|5": 0,
278
+ "lighteval|mmlu:high_school_microeconomics|5": 0,
279
+ "lighteval|mmlu:high_school_physics|5": 0,
280
+ "lighteval|mmlu:high_school_psychology|5": 0,
281
+ "lighteval|mmlu:high_school_statistics|5": 0,
282
+ "lighteval|mmlu:high_school_us_history|5": 0,
283
+ "lighteval|mmlu:high_school_world_history|5": 0,
284
+ "lighteval|mmlu:human_aging|5": 0,
285
+ "lighteval|mmlu:human_sexuality|5": 0,
286
+ "lighteval|mmlu:international_law|5": 0,
287
+ "lighteval|mmlu:jurisprudence|5": 0,
288
+ "lighteval|mmlu:logical_fallacies|5": 0,
289
+ "lighteval|mmlu:machine_learning|5": 0,
290
+ "lighteval|mmlu:management|5": 0,
291
+ "lighteval|mmlu:marketing|5": 0,
292
+ "lighteval|mmlu:medical_genetics|5": 0,
293
+ "lighteval|mmlu:miscellaneous|5": 0,
294
+ "lighteval|mmlu:moral_disputes|5": 0,
295
+ "lighteval|mmlu:moral_scenarios|5": 0,
296
+ "lighteval|mmlu:nutrition|5": 0,
297
+ "lighteval|mmlu:philosophy|5": 0,
298
+ "lighteval|mmlu:prehistory|5": 0,
299
+ "lighteval|mmlu:professional_accounting|5": 0,
300
+ "lighteval|mmlu:professional_law|5": 0,
301
+ "lighteval|mmlu:professional_medicine|5": 0,
302
+ "lighteval|mmlu:professional_psychology|5": 0,
303
+ "lighteval|mmlu:public_relations|5": 0,
304
+ "lighteval|mmlu:security_studies|5": 0,
305
+ "lighteval|mmlu:sociology|5": 0,
306
+ "lighteval|mmlu:us_foreign_policy|5": 0,
307
+ "lighteval|mmlu:virology|5": 0,
308
+ "lighteval|mmlu:world_religions|5": 0
309
+ },
310
+ "config_tasks": {
311
+ "lighteval|mmlu:abstract_algebra": {
312
+ "name": "mmlu:abstract_algebra",
313
+ "prompt_function": "mmlu_harness",
314
+ "hf_repo": "lighteval/mmlu",
315
+ "hf_subset": "abstract_algebra",
316
+ "metric": [
317
+ "loglikelihood_acc"
318
+ ],
319
+ "hf_avail_splits": [
320
+ "auxiliary_train",
321
+ "test",
322
+ "validation",
323
+ "dev"
324
+ ],
325
+ "evaluation_splits": [
326
+ "test"
327
+ ],
328
+ "few_shots_split": "dev",
329
+ "few_shots_select": "sequential",
330
+ "generation_size": 1,
331
+ "stop_sequence": [
332
+ "\n"
333
+ ],
334
+ "output_regex": null,
335
+ "frozen": false,
336
+ "suite": [
337
+ "lighteval",
338
+ "mmlu"
339
+ ],
340
+ "original_num_docs": 100,
341
+ "effective_num_docs": 100
342
+ },
343
+ "lighteval|mmlu:anatomy": {
344
+ "name": "mmlu:anatomy",
345
+ "prompt_function": "mmlu_harness",
346
+ "hf_repo": "lighteval/mmlu",
347
+ "hf_subset": "anatomy",
348
+ "metric": [
349
+ "loglikelihood_acc"
350
+ ],
351
+ "hf_avail_splits": [
352
+ "auxiliary_train",
353
+ "test",
354
+ "validation",
355
+ "dev"
356
+ ],
357
+ "evaluation_splits": [
358
+ "test"
359
+ ],
360
+ "few_shots_split": "dev",
361
+ "few_shots_select": "sequential",
362
+ "generation_size": 1,
363
+ "stop_sequence": [
364
+ "\n"
365
+ ],
366
+ "output_regex": null,
367
+ "frozen": false,
368
+ "suite": [
369
+ "lighteval",
370
+ "mmlu"
371
+ ],
372
+ "original_num_docs": 135,
373
+ "effective_num_docs": 135
374
+ },
375
+ "lighteval|mmlu:astronomy": {
376
+ "name": "mmlu:astronomy",
377
+ "prompt_function": "mmlu_harness",
378
+ "hf_repo": "lighteval/mmlu",
379
+ "hf_subset": "astronomy",
380
+ "metric": [
381
+ "loglikelihood_acc"
382
+ ],
383
+ "hf_avail_splits": [
384
+ "auxiliary_train",
385
+ "test",
386
+ "validation",
387
+ "dev"
388
+ ],
389
+ "evaluation_splits": [
390
+ "test"
391
+ ],
392
+ "few_shots_split": "dev",
393
+ "few_shots_select": "sequential",
394
+ "generation_size": 1,
395
+ "stop_sequence": [
396
+ "\n"
397
+ ],
398
+ "output_regex": null,
399
+ "frozen": false,
400
+ "suite": [
401
+ "lighteval",
402
+ "mmlu"
403
+ ],
404
+ "original_num_docs": 152,
405
+ "effective_num_docs": 152
406
+ },
407
+ "lighteval|mmlu:business_ethics": {
408
+ "name": "mmlu:business_ethics",
409
+ "prompt_function": "mmlu_harness",
410
+ "hf_repo": "lighteval/mmlu",
411
+ "hf_subset": "business_ethics",
412
+ "metric": [
413
+ "loglikelihood_acc"
414
+ ],
415
+ "hf_avail_splits": [
416
+ "auxiliary_train",
417
+ "test",
418
+ "validation",
419
+ "dev"
420
+ ],
421
+ "evaluation_splits": [
422
+ "test"
423
+ ],
424
+ "few_shots_split": "dev",
425
+ "few_shots_select": "sequential",
426
+ "generation_size": 1,
427
+ "stop_sequence": [
428
+ "\n"
429
+ ],
430
+ "output_regex": null,
431
+ "frozen": false,
432
+ "suite": [
433
+ "lighteval",
434
+ "mmlu"
435
+ ],
436
+ "original_num_docs": 100,
437
+ "effective_num_docs": 100
438
+ },
439
+ "lighteval|mmlu:clinical_knowledge": {
440
+ "name": "mmlu:clinical_knowledge",
441
+ "prompt_function": "mmlu_harness",
442
+ "hf_repo": "lighteval/mmlu",
443
+ "hf_subset": "clinical_knowledge",
444
+ "metric": [
445
+ "loglikelihood_acc"
446
+ ],
447
+ "hf_avail_splits": [
448
+ "auxiliary_train",
449
+ "test",
450
+ "validation",
451
+ "dev"
452
+ ],
453
+ "evaluation_splits": [
454
+ "test"
455
+ ],
456
+ "few_shots_split": "dev",
457
+ "few_shots_select": "sequential",
458
+ "generation_size": 1,
459
+ "stop_sequence": [
460
+ "\n"
461
+ ],
462
+ "output_regex": null,
463
+ "frozen": false,
464
+ "suite": [
465
+ "lighteval",
466
+ "mmlu"
467
+ ],
468
+ "original_num_docs": 265,
469
+ "effective_num_docs": 265
470
+ },
471
+ "lighteval|mmlu:college_biology": {
472
+ "name": "mmlu:college_biology",
473
+ "prompt_function": "mmlu_harness",
474
+ "hf_repo": "lighteval/mmlu",
475
+ "hf_subset": "college_biology",
476
+ "metric": [
477
+ "loglikelihood_acc"
478
+ ],
479
+ "hf_avail_splits": [
480
+ "auxiliary_train",
481
+ "test",
482
+ "validation",
483
+ "dev"
484
+ ],
485
+ "evaluation_splits": [
486
+ "test"
487
+ ],
488
+ "few_shots_split": "dev",
489
+ "few_shots_select": "sequential",
490
+ "generation_size": 1,
491
+ "stop_sequence": [
492
+ "\n"
493
+ ],
494
+ "output_regex": null,
495
+ "frozen": false,
496
+ "suite": [
497
+ "lighteval",
498
+ "mmlu"
499
+ ],
500
+ "original_num_docs": 144,
501
+ "effective_num_docs": 144
502
+ },
503
+ "lighteval|mmlu:college_chemistry": {
504
+ "name": "mmlu:college_chemistry",
505
+ "prompt_function": "mmlu_harness",
506
+ "hf_repo": "lighteval/mmlu",
507
+ "hf_subset": "college_chemistry",
508
+ "metric": [
509
+ "loglikelihood_acc"
510
+ ],
511
+ "hf_avail_splits": [
512
+ "auxiliary_train",
513
+ "test",
514
+ "validation",
515
+ "dev"
516
+ ],
517
+ "evaluation_splits": [
518
+ "test"
519
+ ],
520
+ "few_shots_split": "dev",
521
+ "few_shots_select": "sequential",
522
+ "generation_size": 1,
523
+ "stop_sequence": [
524
+ "\n"
525
+ ],
526
+ "output_regex": null,
527
+ "frozen": false,
528
+ "suite": [
529
+ "lighteval",
530
+ "mmlu"
531
+ ],
532
+ "original_num_docs": 100,
533
+ "effective_num_docs": 100
534
+ },
535
+ "lighteval|mmlu:college_computer_science": {
536
+ "name": "mmlu:college_computer_science",
537
+ "prompt_function": "mmlu_harness",
538
+ "hf_repo": "lighteval/mmlu",
539
+ "hf_subset": "college_computer_science",
540
+ "metric": [
541
+ "loglikelihood_acc"
542
+ ],
543
+ "hf_avail_splits": [
544
+ "auxiliary_train",
545
+ "test",
546
+ "validation",
547
+ "dev"
548
+ ],
549
+ "evaluation_splits": [
550
+ "test"
551
+ ],
552
+ "few_shots_split": "dev",
553
+ "few_shots_select": "sequential",
554
+ "generation_size": 1,
555
+ "stop_sequence": [
556
+ "\n"
557
+ ],
558
+ "output_regex": null,
559
+ "frozen": false,
560
+ "suite": [
561
+ "lighteval",
562
+ "mmlu"
563
+ ],
564
+ "original_num_docs": 100,
565
+ "effective_num_docs": 100
566
+ },
567
+ "lighteval|mmlu:college_mathematics": {
568
+ "name": "mmlu:college_mathematics",
569
+ "prompt_function": "mmlu_harness",
570
+ "hf_repo": "lighteval/mmlu",
571
+ "hf_subset": "college_mathematics",
572
+ "metric": [
573
+ "loglikelihood_acc"
574
+ ],
575
+ "hf_avail_splits": [
576
+ "auxiliary_train",
577
+ "test",
578
+ "validation",
579
+ "dev"
580
+ ],
581
+ "evaluation_splits": [
582
+ "test"
583
+ ],
584
+ "few_shots_split": "dev",
585
+ "few_shots_select": "sequential",
586
+ "generation_size": 1,
587
+ "stop_sequence": [
588
+ "\n"
589
+ ],
590
+ "output_regex": null,
591
+ "frozen": false,
592
+ "suite": [
593
+ "lighteval",
594
+ "mmlu"
595
+ ],
596
+ "original_num_docs": 100,
597
+ "effective_num_docs": 100
598
+ },
599
+ "lighteval|mmlu:college_medicine": {
600
+ "name": "mmlu:college_medicine",
601
+ "prompt_function": "mmlu_harness",
602
+ "hf_repo": "lighteval/mmlu",
603
+ "hf_subset": "college_medicine",
604
+ "metric": [
605
+ "loglikelihood_acc"
606
+ ],
607
+ "hf_avail_splits": [
608
+ "auxiliary_train",
609
+ "test",
610
+ "validation",
611
+ "dev"
612
+ ],
613
+ "evaluation_splits": [
614
+ "test"
615
+ ],
616
+ "few_shots_split": "dev",
617
+ "few_shots_select": "sequential",
618
+ "generation_size": 1,
619
+ "stop_sequence": [
620
+ "\n"
621
+ ],
622
+ "output_regex": null,
623
+ "frozen": false,
624
+ "suite": [
625
+ "lighteval",
626
+ "mmlu"
627
+ ],
628
+ "original_num_docs": 173,
629
+ "effective_num_docs": 173
630
+ },
631
+ "lighteval|mmlu:college_physics": {
632
+ "name": "mmlu:college_physics",
633
+ "prompt_function": "mmlu_harness",
634
+ "hf_repo": "lighteval/mmlu",
635
+ "hf_subset": "college_physics",
636
+ "metric": [
637
+ "loglikelihood_acc"
638
+ ],
639
+ "hf_avail_splits": [
640
+ "auxiliary_train",
641
+ "test",
642
+ "validation",
643
+ "dev"
644
+ ],
645
+ "evaluation_splits": [
646
+ "test"
647
+ ],
648
+ "few_shots_split": "dev",
649
+ "few_shots_select": "sequential",
650
+ "generation_size": 1,
651
+ "stop_sequence": [
652
+ "\n"
653
+ ],
654
+ "output_regex": null,
655
+ "frozen": false,
656
+ "suite": [
657
+ "lighteval",
658
+ "mmlu"
659
+ ],
660
+ "original_num_docs": 102,
661
+ "effective_num_docs": 102
662
+ },
663
+ "lighteval|mmlu:computer_security": {
664
+ "name": "mmlu:computer_security",
665
+ "prompt_function": "mmlu_harness",
666
+ "hf_repo": "lighteval/mmlu",
667
+ "hf_subset": "computer_security",
668
+ "metric": [
669
+ "loglikelihood_acc"
670
+ ],
671
+ "hf_avail_splits": [
672
+ "auxiliary_train",
673
+ "test",
674
+ "validation",
675
+ "dev"
676
+ ],
677
+ "evaluation_splits": [
678
+ "test"
679
+ ],
680
+ "few_shots_split": "dev",
681
+ "few_shots_select": "sequential",
682
+ "generation_size": 1,
683
+ "stop_sequence": [
684
+ "\n"
685
+ ],
686
+ "output_regex": null,
687
+ "frozen": false,
688
+ "suite": [
689
+ "lighteval",
690
+ "mmlu"
691
+ ],
692
+ "original_num_docs": 100,
693
+ "effective_num_docs": 100
694
+ },
695
+ "lighteval|mmlu:conceptual_physics": {
696
+ "name": "mmlu:conceptual_physics",
697
+ "prompt_function": "mmlu_harness",
698
+ "hf_repo": "lighteval/mmlu",
699
+ "hf_subset": "conceptual_physics",
700
+ "metric": [
701
+ "loglikelihood_acc"
702
+ ],
703
+ "hf_avail_splits": [
704
+ "auxiliary_train",
705
+ "test",
706
+ "validation",
707
+ "dev"
708
+ ],
709
+ "evaluation_splits": [
710
+ "test"
711
+ ],
712
+ "few_shots_split": "dev",
713
+ "few_shots_select": "sequential",
714
+ "generation_size": 1,
715
+ "stop_sequence": [
716
+ "\n"
717
+ ],
718
+ "output_regex": null,
719
+ "frozen": false,
720
+ "suite": [
721
+ "lighteval",
722
+ "mmlu"
723
+ ],
724
+ "original_num_docs": 235,
725
+ "effective_num_docs": 235
726
+ },
727
+ "lighteval|mmlu:econometrics": {
728
+ "name": "mmlu:econometrics",
729
+ "prompt_function": "mmlu_harness",
730
+ "hf_repo": "lighteval/mmlu",
731
+ "hf_subset": "econometrics",
732
+ "metric": [
733
+ "loglikelihood_acc"
734
+ ],
735
+ "hf_avail_splits": [
736
+ "auxiliary_train",
737
+ "test",
738
+ "validation",
739
+ "dev"
740
+ ],
741
+ "evaluation_splits": [
742
+ "test"
743
+ ],
744
+ "few_shots_split": "dev",
745
+ "few_shots_select": "sequential",
746
+ "generation_size": 1,
747
+ "stop_sequence": [
748
+ "\n"
749
+ ],
750
+ "output_regex": null,
751
+ "frozen": false,
752
+ "suite": [
753
+ "lighteval",
754
+ "mmlu"
755
+ ],
756
+ "original_num_docs": 114,
757
+ "effective_num_docs": 114
758
+ },
759
+ "lighteval|mmlu:electrical_engineering": {
760
+ "name": "mmlu:electrical_engineering",
761
+ "prompt_function": "mmlu_harness",
762
+ "hf_repo": "lighteval/mmlu",
763
+ "hf_subset": "electrical_engineering",
764
+ "metric": [
765
+ "loglikelihood_acc"
766
+ ],
767
+ "hf_avail_splits": [
768
+ "auxiliary_train",
769
+ "test",
770
+ "validation",
771
+ "dev"
772
+ ],
773
+ "evaluation_splits": [
774
+ "test"
775
+ ],
776
+ "few_shots_split": "dev",
777
+ "few_shots_select": "sequential",
778
+ "generation_size": 1,
779
+ "stop_sequence": [
780
+ "\n"
781
+ ],
782
+ "output_regex": null,
783
+ "frozen": false,
784
+ "suite": [
785
+ "lighteval",
786
+ "mmlu"
787
+ ],
788
+ "original_num_docs": 145,
789
+ "effective_num_docs": 145
790
+ },
791
+ "lighteval|mmlu:elementary_mathematics": {
792
+ "name": "mmlu:elementary_mathematics",
793
+ "prompt_function": "mmlu_harness",
794
+ "hf_repo": "lighteval/mmlu",
795
+ "hf_subset": "elementary_mathematics",
796
+ "metric": [
797
+ "loglikelihood_acc"
798
+ ],
799
+ "hf_avail_splits": [
800
+ "auxiliary_train",
801
+ "test",
802
+ "validation",
803
+ "dev"
804
+ ],
805
+ "evaluation_splits": [
806
+ "test"
807
+ ],
808
+ "few_shots_split": "dev",
809
+ "few_shots_select": "sequential",
810
+ "generation_size": 1,
811
+ "stop_sequence": [
812
+ "\n"
813
+ ],
814
+ "output_regex": null,
815
+ "frozen": false,
816
+ "suite": [
817
+ "lighteval",
818
+ "mmlu"
819
+ ],
820
+ "original_num_docs": 378,
821
+ "effective_num_docs": 378
822
+ },
823
+ "lighteval|mmlu:formal_logic": {
824
+ "name": "mmlu:formal_logic",
825
+ "prompt_function": "mmlu_harness",
826
+ "hf_repo": "lighteval/mmlu",
827
+ "hf_subset": "formal_logic",
828
+ "metric": [
829
+ "loglikelihood_acc"
830
+ ],
831
+ "hf_avail_splits": [
832
+ "auxiliary_train",
833
+ "test",
834
+ "validation",
835
+ "dev"
836
+ ],
837
+ "evaluation_splits": [
838
+ "test"
839
+ ],
840
+ "few_shots_split": "dev",
841
+ "few_shots_select": "sequential",
842
+ "generation_size": 1,
843
+ "stop_sequence": [
844
+ "\n"
845
+ ],
846
+ "output_regex": null,
847
+ "frozen": false,
848
+ "suite": [
849
+ "lighteval",
850
+ "mmlu"
851
+ ],
852
+ "original_num_docs": 126,
853
+ "effective_num_docs": 126
854
+ },
855
+ "lighteval|mmlu:global_facts": {
856
+ "name": "mmlu:global_facts",
857
+ "prompt_function": "mmlu_harness",
858
+ "hf_repo": "lighteval/mmlu",
859
+ "hf_subset": "global_facts",
860
+ "metric": [
861
+ "loglikelihood_acc"
862
+ ],
863
+ "hf_avail_splits": [
864
+ "auxiliary_train",
865
+ "test",
866
+ "validation",
867
+ "dev"
868
+ ],
869
+ "evaluation_splits": [
870
+ "test"
871
+ ],
872
+ "few_shots_split": "dev",
873
+ "few_shots_select": "sequential",
874
+ "generation_size": 1,
875
+ "stop_sequence": [
876
+ "\n"
877
+ ],
878
+ "output_regex": null,
879
+ "frozen": false,
880
+ "suite": [
881
+ "lighteval",
882
+ "mmlu"
883
+ ],
884
+ "original_num_docs": 100,
885
+ "effective_num_docs": 100
886
+ },
887
+ "lighteval|mmlu:high_school_biology": {
888
+ "name": "mmlu:high_school_biology",
889
+ "prompt_function": "mmlu_harness",
890
+ "hf_repo": "lighteval/mmlu",
891
+ "hf_subset": "high_school_biology",
892
+ "metric": [
893
+ "loglikelihood_acc"
894
+ ],
895
+ "hf_avail_splits": [
896
+ "auxiliary_train",
897
+ "test",
898
+ "validation",
899
+ "dev"
900
+ ],
901
+ "evaluation_splits": [
902
+ "test"
903
+ ],
904
+ "few_shots_split": "dev",
905
+ "few_shots_select": "sequential",
906
+ "generation_size": 1,
907
+ "stop_sequence": [
908
+ "\n"
909
+ ],
910
+ "output_regex": null,
911
+ "frozen": false,
912
+ "suite": [
913
+ "lighteval",
914
+ "mmlu"
915
+ ],
916
+ "original_num_docs": 310,
917
+ "effective_num_docs": 310
918
+ },
919
+ "lighteval|mmlu:high_school_chemistry": {
920
+ "name": "mmlu:high_school_chemistry",
921
+ "prompt_function": "mmlu_harness",
922
+ "hf_repo": "lighteval/mmlu",
923
+ "hf_subset": "high_school_chemistry",
924
+ "metric": [
925
+ "loglikelihood_acc"
926
+ ],
927
+ "hf_avail_splits": [
928
+ "auxiliary_train",
929
+ "test",
930
+ "validation",
931
+ "dev"
932
+ ],
933
+ "evaluation_splits": [
934
+ "test"
935
+ ],
936
+ "few_shots_split": "dev",
937
+ "few_shots_select": "sequential",
938
+ "generation_size": 1,
939
+ "stop_sequence": [
940
+ "\n"
941
+ ],
942
+ "output_regex": null,
943
+ "frozen": false,
944
+ "suite": [
945
+ "lighteval",
946
+ "mmlu"
947
+ ],
948
+ "original_num_docs": 203,
949
+ "effective_num_docs": 203
950
+ },
951
+ "lighteval|mmlu:high_school_computer_science": {
952
+ "name": "mmlu:high_school_computer_science",
953
+ "prompt_function": "mmlu_harness",
954
+ "hf_repo": "lighteval/mmlu",
955
+ "hf_subset": "high_school_computer_science",
956
+ "metric": [
957
+ "loglikelihood_acc"
958
+ ],
959
+ "hf_avail_splits": [
960
+ "auxiliary_train",
961
+ "test",
962
+ "validation",
963
+ "dev"
964
+ ],
965
+ "evaluation_splits": [
966
+ "test"
967
+ ],
968
+ "few_shots_split": "dev",
969
+ "few_shots_select": "sequential",
970
+ "generation_size": 1,
971
+ "stop_sequence": [
972
+ "\n"
973
+ ],
974
+ "output_regex": null,
975
+ "frozen": false,
976
+ "suite": [
977
+ "lighteval",
978
+ "mmlu"
979
+ ],
980
+ "original_num_docs": 100,
981
+ "effective_num_docs": 100
982
+ },
983
+ "lighteval|mmlu:high_school_european_history": {
984
+ "name": "mmlu:high_school_european_history",
985
+ "prompt_function": "mmlu_harness",
986
+ "hf_repo": "lighteval/mmlu",
987
+ "hf_subset": "high_school_european_history",
988
+ "metric": [
989
+ "loglikelihood_acc"
990
+ ],
991
+ "hf_avail_splits": [
992
+ "auxiliary_train",
993
+ "test",
994
+ "validation",
995
+ "dev"
996
+ ],
997
+ "evaluation_splits": [
998
+ "test"
999
+ ],
1000
+ "few_shots_split": "dev",
1001
+ "few_shots_select": "sequential",
1002
+ "generation_size": 1,
1003
+ "stop_sequence": [
1004
+ "\n"
1005
+ ],
1006
+ "output_regex": null,
1007
+ "frozen": false,
1008
+ "suite": [
1009
+ "lighteval",
1010
+ "mmlu"
1011
+ ],
1012
+ "original_num_docs": 165,
1013
+ "effective_num_docs": 165
1014
+ },
1015
+ "lighteval|mmlu:high_school_geography": {
1016
+ "name": "mmlu:high_school_geography",
1017
+ "prompt_function": "mmlu_harness",
1018
+ "hf_repo": "lighteval/mmlu",
1019
+ "hf_subset": "high_school_geography",
1020
+ "metric": [
1021
+ "loglikelihood_acc"
1022
+ ],
1023
+ "hf_avail_splits": [
1024
+ "auxiliary_train",
1025
+ "test",
1026
+ "validation",
1027
+ "dev"
1028
+ ],
1029
+ "evaluation_splits": [
1030
+ "test"
1031
+ ],
1032
+ "few_shots_split": "dev",
1033
+ "few_shots_select": "sequential",
1034
+ "generation_size": 1,
1035
+ "stop_sequence": [
1036
+ "\n"
1037
+ ],
1038
+ "output_regex": null,
1039
+ "frozen": false,
1040
+ "suite": [
1041
+ "lighteval",
1042
+ "mmlu"
1043
+ ],
1044
+ "original_num_docs": 198,
1045
+ "effective_num_docs": 198
1046
+ },
1047
+ "lighteval|mmlu:high_school_government_and_politics": {
1048
+ "name": "mmlu:high_school_government_and_politics",
1049
+ "prompt_function": "mmlu_harness",
1050
+ "hf_repo": "lighteval/mmlu",
1051
+ "hf_subset": "high_school_government_and_politics",
1052
+ "metric": [
1053
+ "loglikelihood_acc"
1054
+ ],
1055
+ "hf_avail_splits": [
1056
+ "auxiliary_train",
1057
+ "test",
1058
+ "validation",
1059
+ "dev"
1060
+ ],
1061
+ "evaluation_splits": [
1062
+ "test"
1063
+ ],
1064
+ "few_shots_split": "dev",
1065
+ "few_shots_select": "sequential",
1066
+ "generation_size": 1,
1067
+ "stop_sequence": [
1068
+ "\n"
1069
+ ],
1070
+ "output_regex": null,
1071
+ "frozen": false,
1072
+ "suite": [
1073
+ "lighteval",
1074
+ "mmlu"
1075
+ ],
1076
+ "original_num_docs": 193,
1077
+ "effective_num_docs": 193
1078
+ },
1079
+ "lighteval|mmlu:high_school_macroeconomics": {
1080
+ "name": "mmlu:high_school_macroeconomics",
1081
+ "prompt_function": "mmlu_harness",
1082
+ "hf_repo": "lighteval/mmlu",
1083
+ "hf_subset": "high_school_macroeconomics",
1084
+ "metric": [
1085
+ "loglikelihood_acc"
1086
+ ],
1087
+ "hf_avail_splits": [
1088
+ "auxiliary_train",
1089
+ "test",
1090
+ "validation",
1091
+ "dev"
1092
+ ],
1093
+ "evaluation_splits": [
1094
+ "test"
1095
+ ],
1096
+ "few_shots_split": "dev",
1097
+ "few_shots_select": "sequential",
1098
+ "generation_size": 1,
1099
+ "stop_sequence": [
1100
+ "\n"
1101
+ ],
1102
+ "output_regex": null,
1103
+ "frozen": false,
1104
+ "suite": [
1105
+ "lighteval",
1106
+ "mmlu"
1107
+ ],
1108
+ "original_num_docs": 390,
1109
+ "effective_num_docs": 390
1110
+ },
1111
+ "lighteval|mmlu:high_school_mathematics": {
1112
+ "name": "mmlu:high_school_mathematics",
1113
+ "prompt_function": "mmlu_harness",
1114
+ "hf_repo": "lighteval/mmlu",
1115
+ "hf_subset": "high_school_mathematics",
1116
+ "metric": [
1117
+ "loglikelihood_acc"
1118
+ ],
1119
+ "hf_avail_splits": [
1120
+ "auxiliary_train",
1121
+ "test",
1122
+ "validation",
1123
+ "dev"
1124
+ ],
1125
+ "evaluation_splits": [
1126
+ "test"
1127
+ ],
1128
+ "few_shots_split": "dev",
1129
+ "few_shots_select": "sequential",
1130
+ "generation_size": 1,
1131
+ "stop_sequence": [
1132
+ "\n"
1133
+ ],
1134
+ "output_regex": null,
1135
+ "frozen": false,
1136
+ "suite": [
1137
+ "lighteval",
1138
+ "mmlu"
1139
+ ],
1140
+ "original_num_docs": 270,
1141
+ "effective_num_docs": 270
1142
+ },
1143
+ "lighteval|mmlu:high_school_microeconomics": {
1144
+ "name": "mmlu:high_school_microeconomics",
1145
+ "prompt_function": "mmlu_harness",
1146
+ "hf_repo": "lighteval/mmlu",
1147
+ "hf_subset": "high_school_microeconomics",
1148
+ "metric": [
1149
+ "loglikelihood_acc"
1150
+ ],
1151
+ "hf_avail_splits": [
1152
+ "auxiliary_train",
1153
+ "test",
1154
+ "validation",
1155
+ "dev"
1156
+ ],
1157
+ "evaluation_splits": [
1158
+ "test"
1159
+ ],
1160
+ "few_shots_split": "dev",
1161
+ "few_shots_select": "sequential",
1162
+ "generation_size": 1,
1163
+ "stop_sequence": [
1164
+ "\n"
1165
+ ],
1166
+ "output_regex": null,
1167
+ "frozen": false,
1168
+ "suite": [
1169
+ "lighteval",
1170
+ "mmlu"
1171
+ ],
1172
+ "original_num_docs": 238,
1173
+ "effective_num_docs": 238
1174
+ },
1175
+ "lighteval|mmlu:high_school_physics": {
1176
+ "name": "mmlu:high_school_physics",
1177
+ "prompt_function": "mmlu_harness",
1178
+ "hf_repo": "lighteval/mmlu",
1179
+ "hf_subset": "high_school_physics",
1180
+ "metric": [
1181
+ "loglikelihood_acc"
1182
+ ],
1183
+ "hf_avail_splits": [
1184
+ "auxiliary_train",
1185
+ "test",
1186
+ "validation",
1187
+ "dev"
1188
+ ],
1189
+ "evaluation_splits": [
1190
+ "test"
1191
+ ],
1192
+ "few_shots_split": "dev",
1193
+ "few_shots_select": "sequential",
1194
+ "generation_size": 1,
1195
+ "stop_sequence": [
1196
+ "\n"
1197
+ ],
1198
+ "output_regex": null,
1199
+ "frozen": false,
1200
+ "suite": [
1201
+ "lighteval",
1202
+ "mmlu"
1203
+ ],
1204
+ "original_num_docs": 151,
1205
+ "effective_num_docs": 151
1206
+ },
1207
+ "lighteval|mmlu:high_school_psychology": {
1208
+ "name": "mmlu:high_school_psychology",
1209
+ "prompt_function": "mmlu_harness",
1210
+ "hf_repo": "lighteval/mmlu",
1211
+ "hf_subset": "high_school_psychology",
1212
+ "metric": [
1213
+ "loglikelihood_acc"
1214
+ ],
1215
+ "hf_avail_splits": [
1216
+ "auxiliary_train",
1217
+ "test",
1218
+ "validation",
1219
+ "dev"
1220
+ ],
1221
+ "evaluation_splits": [
1222
+ "test"
1223
+ ],
1224
+ "few_shots_split": "dev",
1225
+ "few_shots_select": "sequential",
1226
+ "generation_size": 1,
1227
+ "stop_sequence": [
1228
+ "\n"
1229
+ ],
1230
+ "output_regex": null,
1231
+ "frozen": false,
1232
+ "suite": [
1233
+ "lighteval",
1234
+ "mmlu"
1235
+ ],
1236
+ "original_num_docs": 545,
1237
+ "effective_num_docs": 545
1238
+ },
1239
+ "lighteval|mmlu:high_school_statistics": {
1240
+ "name": "mmlu:high_school_statistics",
1241
+ "prompt_function": "mmlu_harness",
1242
+ "hf_repo": "lighteval/mmlu",
1243
+ "hf_subset": "high_school_statistics",
1244
+ "metric": [
1245
+ "loglikelihood_acc"
1246
+ ],
1247
+ "hf_avail_splits": [
1248
+ "auxiliary_train",
1249
+ "test",
1250
+ "validation",
1251
+ "dev"
1252
+ ],
1253
+ "evaluation_splits": [
1254
+ "test"
1255
+ ],
1256
+ "few_shots_split": "dev",
1257
+ "few_shots_select": "sequential",
1258
+ "generation_size": 1,
1259
+ "stop_sequence": [
1260
+ "\n"
1261
+ ],
1262
+ "output_regex": null,
1263
+ "frozen": false,
1264
+ "suite": [
1265
+ "lighteval",
1266
+ "mmlu"
1267
+ ],
1268
+ "original_num_docs": 216,
1269
+ "effective_num_docs": 216
1270
+ },
1271
+ "lighteval|mmlu:high_school_us_history": {
1272
+ "name": "mmlu:high_school_us_history",
1273
+ "prompt_function": "mmlu_harness",
1274
+ "hf_repo": "lighteval/mmlu",
1275
+ "hf_subset": "high_school_us_history",
1276
+ "metric": [
1277
+ "loglikelihood_acc"
1278
+ ],
1279
+ "hf_avail_splits": [
1280
+ "auxiliary_train",
1281
+ "test",
1282
+ "validation",
1283
+ "dev"
1284
+ ],
1285
+ "evaluation_splits": [
1286
+ "test"
1287
+ ],
1288
+ "few_shots_split": "dev",
1289
+ "few_shots_select": "sequential",
1290
+ "generation_size": 1,
1291
+ "stop_sequence": [
1292
+ "\n"
1293
+ ],
1294
+ "output_regex": null,
1295
+ "frozen": false,
1296
+ "suite": [
1297
+ "lighteval",
1298
+ "mmlu"
1299
+ ],
1300
+ "original_num_docs": 204,
1301
+ "effective_num_docs": 204
1302
+ },
1303
+ "lighteval|mmlu:high_school_world_history": {
1304
+ "name": "mmlu:high_school_world_history",
1305
+ "prompt_function": "mmlu_harness",
1306
+ "hf_repo": "lighteval/mmlu",
1307
+ "hf_subset": "high_school_world_history",
1308
+ "metric": [
1309
+ "loglikelihood_acc"
1310
+ ],
1311
+ "hf_avail_splits": [
1312
+ "auxiliary_train",
1313
+ "test",
1314
+ "validation",
1315
+ "dev"
1316
+ ],
1317
+ "evaluation_splits": [
1318
+ "test"
1319
+ ],
1320
+ "few_shots_split": "dev",
1321
+ "few_shots_select": "sequential",
1322
+ "generation_size": 1,
1323
+ "stop_sequence": [
1324
+ "\n"
1325
+ ],
1326
+ "output_regex": null,
1327
+ "frozen": false,
1328
+ "suite": [
1329
+ "lighteval",
1330
+ "mmlu"
1331
+ ],
1332
+ "original_num_docs": 237,
1333
+ "effective_num_docs": 237
1334
+ },
1335
+ "lighteval|mmlu:human_aging": {
1336
+ "name": "mmlu:human_aging",
1337
+ "prompt_function": "mmlu_harness",
1338
+ "hf_repo": "lighteval/mmlu",
1339
+ "hf_subset": "human_aging",
1340
+ "metric": [
1341
+ "loglikelihood_acc"
1342
+ ],
1343
+ "hf_avail_splits": [
1344
+ "auxiliary_train",
1345
+ "test",
1346
+ "validation",
1347
+ "dev"
1348
+ ],
1349
+ "evaluation_splits": [
1350
+ "test"
1351
+ ],
1352
+ "few_shots_split": "dev",
1353
+ "few_shots_select": "sequential",
1354
+ "generation_size": 1,
1355
+ "stop_sequence": [
1356
+ "\n"
1357
+ ],
1358
+ "output_regex": null,
1359
+ "frozen": false,
1360
+ "suite": [
1361
+ "lighteval",
1362
+ "mmlu"
1363
+ ],
1364
+ "original_num_docs": 223,
1365
+ "effective_num_docs": 223
1366
+ },
1367
+ "lighteval|mmlu:human_sexuality": {
1368
+ "name": "mmlu:human_sexuality",
1369
+ "prompt_function": "mmlu_harness",
1370
+ "hf_repo": "lighteval/mmlu",
1371
+ "hf_subset": "human_sexuality",
1372
+ "metric": [
1373
+ "loglikelihood_acc"
1374
+ ],
1375
+ "hf_avail_splits": [
1376
+ "auxiliary_train",
1377
+ "test",
1378
+ "validation",
1379
+ "dev"
1380
+ ],
1381
+ "evaluation_splits": [
1382
+ "test"
1383
+ ],
1384
+ "few_shots_split": "dev",
1385
+ "few_shots_select": "sequential",
1386
+ "generation_size": 1,
1387
+ "stop_sequence": [
1388
+ "\n"
1389
+ ],
1390
+ "output_regex": null,
1391
+ "frozen": false,
1392
+ "suite": [
1393
+ "lighteval",
1394
+ "mmlu"
1395
+ ],
1396
+ "original_num_docs": 131,
1397
+ "effective_num_docs": 131
1398
+ },
1399
+ "lighteval|mmlu:international_law": {
1400
+ "name": "mmlu:international_law",
1401
+ "prompt_function": "mmlu_harness",
1402
+ "hf_repo": "lighteval/mmlu",
1403
+ "hf_subset": "international_law",
1404
+ "metric": [
1405
+ "loglikelihood_acc"
1406
+ ],
1407
+ "hf_avail_splits": [
1408
+ "auxiliary_train",
1409
+ "test",
1410
+ "validation",
1411
+ "dev"
1412
+ ],
1413
+ "evaluation_splits": [
1414
+ "test"
1415
+ ],
1416
+ "few_shots_split": "dev",
1417
+ "few_shots_select": "sequential",
1418
+ "generation_size": 1,
1419
+ "stop_sequence": [
1420
+ "\n"
1421
+ ],
1422
+ "output_regex": null,
1423
+ "frozen": false,
1424
+ "suite": [
1425
+ "lighteval",
1426
+ "mmlu"
1427
+ ],
1428
+ "original_num_docs": 121,
1429
+ "effective_num_docs": 121
1430
+ },
1431
+ "lighteval|mmlu:jurisprudence": {
1432
+ "name": "mmlu:jurisprudence",
1433
+ "prompt_function": "mmlu_harness",
1434
+ "hf_repo": "lighteval/mmlu",
1435
+ "hf_subset": "jurisprudence",
1436
+ "metric": [
1437
+ "loglikelihood_acc"
1438
+ ],
1439
+ "hf_avail_splits": [
1440
+ "auxiliary_train",
1441
+ "test",
1442
+ "validation",
1443
+ "dev"
1444
+ ],
1445
+ "evaluation_splits": [
1446
+ "test"
1447
+ ],
1448
+ "few_shots_split": "dev",
1449
+ "few_shots_select": "sequential",
1450
+ "generation_size": 1,
1451
+ "stop_sequence": [
1452
+ "\n"
1453
+ ],
1454
+ "output_regex": null,
1455
+ "frozen": false,
1456
+ "suite": [
1457
+ "lighteval",
1458
+ "mmlu"
1459
+ ],
1460
+ "original_num_docs": 108,
1461
+ "effective_num_docs": 108
1462
+ },
1463
+ "lighteval|mmlu:logical_fallacies": {
1464
+ "name": "mmlu:logical_fallacies",
1465
+ "prompt_function": "mmlu_harness",
1466
+ "hf_repo": "lighteval/mmlu",
1467
+ "hf_subset": "logical_fallacies",
1468
+ "metric": [
1469
+ "loglikelihood_acc"
1470
+ ],
1471
+ "hf_avail_splits": [
1472
+ "auxiliary_train",
1473
+ "test",
1474
+ "validation",
1475
+ "dev"
1476
+ ],
1477
+ "evaluation_splits": [
1478
+ "test"
1479
+ ],
1480
+ "few_shots_split": "dev",
1481
+ "few_shots_select": "sequential",
1482
+ "generation_size": 1,
1483
+ "stop_sequence": [
1484
+ "\n"
1485
+ ],
1486
+ "output_regex": null,
1487
+ "frozen": false,
1488
+ "suite": [
1489
+ "lighteval",
1490
+ "mmlu"
1491
+ ],
1492
+ "original_num_docs": 163,
1493
+ "effective_num_docs": 163
1494
+ },
1495
+ "lighteval|mmlu:machine_learning": {
1496
+ "name": "mmlu:machine_learning",
1497
+ "prompt_function": "mmlu_harness",
1498
+ "hf_repo": "lighteval/mmlu",
1499
+ "hf_subset": "machine_learning",
1500
+ "metric": [
1501
+ "loglikelihood_acc"
1502
+ ],
1503
+ "hf_avail_splits": [
1504
+ "auxiliary_train",
1505
+ "test",
1506
+ "validation",
1507
+ "dev"
1508
+ ],
1509
+ "evaluation_splits": [
1510
+ "test"
1511
+ ],
1512
+ "few_shots_split": "dev",
1513
+ "few_shots_select": "sequential",
1514
+ "generation_size": 1,
1515
+ "stop_sequence": [
1516
+ "\n"
1517
+ ],
1518
+ "output_regex": null,
1519
+ "frozen": false,
1520
+ "suite": [
1521
+ "lighteval",
1522
+ "mmlu"
1523
+ ],
1524
+ "original_num_docs": 112,
1525
+ "effective_num_docs": 112
1526
+ },
1527
+ "lighteval|mmlu:management": {
1528
+ "name": "mmlu:management",
1529
+ "prompt_function": "mmlu_harness",
1530
+ "hf_repo": "lighteval/mmlu",
1531
+ "hf_subset": "management",
1532
+ "metric": [
1533
+ "loglikelihood_acc"
1534
+ ],
1535
+ "hf_avail_splits": [
1536
+ "auxiliary_train",
1537
+ "test",
1538
+ "validation",
1539
+ "dev"
1540
+ ],
1541
+ "evaluation_splits": [
1542
+ "test"
1543
+ ],
1544
+ "few_shots_split": "dev",
1545
+ "few_shots_select": "sequential",
1546
+ "generation_size": 1,
1547
+ "stop_sequence": [
1548
+ "\n"
1549
+ ],
1550
+ "output_regex": null,
1551
+ "frozen": false,
1552
+ "suite": [
1553
+ "lighteval",
1554
+ "mmlu"
1555
+ ],
1556
+ "original_num_docs": 103,
1557
+ "effective_num_docs": 103
1558
+ },
1559
+ "lighteval|mmlu:marketing": {
1560
+ "name": "mmlu:marketing",
1561
+ "prompt_function": "mmlu_harness",
1562
+ "hf_repo": "lighteval/mmlu",
1563
+ "hf_subset": "marketing",
1564
+ "metric": [
1565
+ "loglikelihood_acc"
1566
+ ],
1567
+ "hf_avail_splits": [
1568
+ "auxiliary_train",
1569
+ "test",
1570
+ "validation",
1571
+ "dev"
1572
+ ],
1573
+ "evaluation_splits": [
1574
+ "test"
1575
+ ],
1576
+ "few_shots_split": "dev",
1577
+ "few_shots_select": "sequential",
1578
+ "generation_size": 1,
1579
+ "stop_sequence": [
1580
+ "\n"
1581
+ ],
1582
+ "output_regex": null,
1583
+ "frozen": false,
1584
+ "suite": [
1585
+ "lighteval",
1586
+ "mmlu"
1587
+ ],
1588
+ "original_num_docs": 234,
1589
+ "effective_num_docs": 234
1590
+ },
1591
+ "lighteval|mmlu:medical_genetics": {
1592
+ "name": "mmlu:medical_genetics",
1593
+ "prompt_function": "mmlu_harness",
1594
+ "hf_repo": "lighteval/mmlu",
1595
+ "hf_subset": "medical_genetics",
1596
+ "metric": [
1597
+ "loglikelihood_acc"
1598
+ ],
1599
+ "hf_avail_splits": [
1600
+ "auxiliary_train",
1601
+ "test",
1602
+ "validation",
1603
+ "dev"
1604
+ ],
1605
+ "evaluation_splits": [
1606
+ "test"
1607
+ ],
1608
+ "few_shots_split": "dev",
1609
+ "few_shots_select": "sequential",
1610
+ "generation_size": 1,
1611
+ "stop_sequence": [
1612
+ "\n"
1613
+ ],
1614
+ "output_regex": null,
1615
+ "frozen": false,
1616
+ "suite": [
1617
+ "lighteval",
1618
+ "mmlu"
1619
+ ],
1620
+ "original_num_docs": 100,
1621
+ "effective_num_docs": 100
1622
+ },
1623
+ "lighteval|mmlu:miscellaneous": {
1624
+ "name": "mmlu:miscellaneous",
1625
+ "prompt_function": "mmlu_harness",
1626
+ "hf_repo": "lighteval/mmlu",
1627
+ "hf_subset": "miscellaneous",
1628
+ "metric": [
1629
+ "loglikelihood_acc"
1630
+ ],
1631
+ "hf_avail_splits": [
1632
+ "auxiliary_train",
1633
+ "test",
1634
+ "validation",
1635
+ "dev"
1636
+ ],
1637
+ "evaluation_splits": [
1638
+ "test"
1639
+ ],
1640
+ "few_shots_split": "dev",
1641
+ "few_shots_select": "sequential",
1642
+ "generation_size": 1,
1643
+ "stop_sequence": [
1644
+ "\n"
1645
+ ],
1646
+ "output_regex": null,
1647
+ "frozen": false,
1648
+ "suite": [
1649
+ "lighteval",
1650
+ "mmlu"
1651
+ ],
1652
+ "original_num_docs": 783,
1653
+ "effective_num_docs": 783
1654
+ },
1655
+ "lighteval|mmlu:moral_disputes": {
1656
+ "name": "mmlu:moral_disputes",
1657
+ "prompt_function": "mmlu_harness",
1658
+ "hf_repo": "lighteval/mmlu",
1659
+ "hf_subset": "moral_disputes",
1660
+ "metric": [
1661
+ "loglikelihood_acc"
1662
+ ],
1663
+ "hf_avail_splits": [
1664
+ "auxiliary_train",
1665
+ "test",
1666
+ "validation",
1667
+ "dev"
1668
+ ],
1669
+ "evaluation_splits": [
1670
+ "test"
1671
+ ],
1672
+ "few_shots_split": "dev",
1673
+ "few_shots_select": "sequential",
1674
+ "generation_size": 1,
1675
+ "stop_sequence": [
1676
+ "\n"
1677
+ ],
1678
+ "output_regex": null,
1679
+ "frozen": false,
1680
+ "suite": [
1681
+ "lighteval",
1682
+ "mmlu"
1683
+ ],
1684
+ "original_num_docs": 346,
1685
+ "effective_num_docs": 346
1686
+ },
1687
+ "lighteval|mmlu:moral_scenarios": {
1688
+ "name": "mmlu:moral_scenarios",
1689
+ "prompt_function": "mmlu_harness",
1690
+ "hf_repo": "lighteval/mmlu",
1691
+ "hf_subset": "moral_scenarios",
1692
+ "metric": [
1693
+ "loglikelihood_acc"
1694
+ ],
1695
+ "hf_avail_splits": [
1696
+ "auxiliary_train",
1697
+ "test",
1698
+ "validation",
1699
+ "dev"
1700
+ ],
1701
+ "evaluation_splits": [
1702
+ "test"
1703
+ ],
1704
+ "few_shots_split": "dev",
1705
+ "few_shots_select": "sequential",
1706
+ "generation_size": 1,
1707
+ "stop_sequence": [
1708
+ "\n"
1709
+ ],
1710
+ "output_regex": null,
1711
+ "frozen": false,
1712
+ "suite": [
1713
+ "lighteval",
1714
+ "mmlu"
1715
+ ],
1716
+ "original_num_docs": 895,
1717
+ "effective_num_docs": 895
1718
+ },
1719
+ "lighteval|mmlu:nutrition": {
1720
+ "name": "mmlu:nutrition",
1721
+ "prompt_function": "mmlu_harness",
1722
+ "hf_repo": "lighteval/mmlu",
1723
+ "hf_subset": "nutrition",
1724
+ "metric": [
1725
+ "loglikelihood_acc"
1726
+ ],
1727
+ "hf_avail_splits": [
1728
+ "auxiliary_train",
1729
+ "test",
1730
+ "validation",
1731
+ "dev"
1732
+ ],
1733
+ "evaluation_splits": [
1734
+ "test"
1735
+ ],
1736
+ "few_shots_split": "dev",
1737
+ "few_shots_select": "sequential",
1738
+ "generation_size": 1,
1739
+ "stop_sequence": [
1740
+ "\n"
1741
+ ],
1742
+ "output_regex": null,
1743
+ "frozen": false,
1744
+ "suite": [
1745
+ "lighteval",
1746
+ "mmlu"
1747
+ ],
1748
+ "original_num_docs": 306,
1749
+ "effective_num_docs": 306
1750
+ },
1751
+ "lighteval|mmlu:philosophy": {
1752
+ "name": "mmlu:philosophy",
1753
+ "prompt_function": "mmlu_harness",
1754
+ "hf_repo": "lighteval/mmlu",
1755
+ "hf_subset": "philosophy",
1756
+ "metric": [
1757
+ "loglikelihood_acc"
1758
+ ],
1759
+ "hf_avail_splits": [
1760
+ "auxiliary_train",
1761
+ "test",
1762
+ "validation",
1763
+ "dev"
1764
+ ],
1765
+ "evaluation_splits": [
1766
+ "test"
1767
+ ],
1768
+ "few_shots_split": "dev",
1769
+ "few_shots_select": "sequential",
1770
+ "generation_size": 1,
1771
+ "stop_sequence": [
1772
+ "\n"
1773
+ ],
1774
+ "output_regex": null,
1775
+ "frozen": false,
1776
+ "suite": [
1777
+ "lighteval",
1778
+ "mmlu"
1779
+ ],
1780
+ "original_num_docs": 311,
1781
+ "effective_num_docs": 311
1782
+ },
1783
+ "lighteval|mmlu:prehistory": {
1784
+ "name": "mmlu:prehistory",
1785
+ "prompt_function": "mmlu_harness",
1786
+ "hf_repo": "lighteval/mmlu",
1787
+ "hf_subset": "prehistory",
1788
+ "metric": [
1789
+ "loglikelihood_acc"
1790
+ ],
1791
+ "hf_avail_splits": [
1792
+ "auxiliary_train",
1793
+ "test",
1794
+ "validation",
1795
+ "dev"
1796
+ ],
1797
+ "evaluation_splits": [
1798
+ "test"
1799
+ ],
1800
+ "few_shots_split": "dev",
1801
+ "few_shots_select": "sequential",
1802
+ "generation_size": 1,
1803
+ "stop_sequence": [
1804
+ "\n"
1805
+ ],
1806
+ "output_regex": null,
1807
+ "frozen": false,
1808
+ "suite": [
1809
+ "lighteval",
1810
+ "mmlu"
1811
+ ],
1812
+ "original_num_docs": 324,
1813
+ "effective_num_docs": 324
1814
+ },
1815
+ "lighteval|mmlu:professional_accounting": {
1816
+ "name": "mmlu:professional_accounting",
1817
+ "prompt_function": "mmlu_harness",
1818
+ "hf_repo": "lighteval/mmlu",
1819
+ "hf_subset": "professional_accounting",
1820
+ "metric": [
1821
+ "loglikelihood_acc"
1822
+ ],
1823
+ "hf_avail_splits": [
1824
+ "auxiliary_train",
1825
+ "test",
1826
+ "validation",
1827
+ "dev"
1828
+ ],
1829
+ "evaluation_splits": [
1830
+ "test"
1831
+ ],
1832
+ "few_shots_split": "dev",
1833
+ "few_shots_select": "sequential",
1834
+ "generation_size": 1,
1835
+ "stop_sequence": [
1836
+ "\n"
1837
+ ],
1838
+ "output_regex": null,
1839
+ "frozen": false,
1840
+ "suite": [
1841
+ "lighteval",
1842
+ "mmlu"
1843
+ ],
1844
+ "original_num_docs": 282,
1845
+ "effective_num_docs": 282
1846
+ },
1847
+ "lighteval|mmlu:professional_law": {
1848
+ "name": "mmlu:professional_law",
1849
+ "prompt_function": "mmlu_harness",
1850
+ "hf_repo": "lighteval/mmlu",
1851
+ "hf_subset": "professional_law",
1852
+ "metric": [
1853
+ "loglikelihood_acc"
1854
+ ],
1855
+ "hf_avail_splits": [
1856
+ "auxiliary_train",
1857
+ "test",
1858
+ "validation",
1859
+ "dev"
1860
+ ],
1861
+ "evaluation_splits": [
1862
+ "test"
1863
+ ],
1864
+ "few_shots_split": "dev",
1865
+ "few_shots_select": "sequential",
1866
+ "generation_size": 1,
1867
+ "stop_sequence": [
1868
+ "\n"
1869
+ ],
1870
+ "output_regex": null,
1871
+ "frozen": false,
1872
+ "suite": [
1873
+ "lighteval",
1874
+ "mmlu"
1875
+ ],
1876
+ "original_num_docs": 1534,
1877
+ "effective_num_docs": 1534
1878
+ },
1879
+ "lighteval|mmlu:professional_medicine": {
1880
+ "name": "mmlu:professional_medicine",
1881
+ "prompt_function": "mmlu_harness",
1882
+ "hf_repo": "lighteval/mmlu",
1883
+ "hf_subset": "professional_medicine",
1884
+ "metric": [
1885
+ "loglikelihood_acc"
1886
+ ],
1887
+ "hf_avail_splits": [
1888
+ "auxiliary_train",
1889
+ "test",
1890
+ "validation",
1891
+ "dev"
1892
+ ],
1893
+ "evaluation_splits": [
1894
+ "test"
1895
+ ],
1896
+ "few_shots_split": "dev",
1897
+ "few_shots_select": "sequential",
1898
+ "generation_size": 1,
1899
+ "stop_sequence": [
1900
+ "\n"
1901
+ ],
1902
+ "output_regex": null,
1903
+ "frozen": false,
1904
+ "suite": [
1905
+ "lighteval",
1906
+ "mmlu"
1907
+ ],
1908
+ "original_num_docs": 272,
1909
+ "effective_num_docs": 272
1910
+ },
1911
+ "lighteval|mmlu:professional_psychology": {
1912
+ "name": "mmlu:professional_psychology",
1913
+ "prompt_function": "mmlu_harness",
1914
+ "hf_repo": "lighteval/mmlu",
1915
+ "hf_subset": "professional_psychology",
1916
+ "metric": [
1917
+ "loglikelihood_acc"
1918
+ ],
1919
+ "hf_avail_splits": [
1920
+ "auxiliary_train",
1921
+ "test",
1922
+ "validation",
1923
+ "dev"
1924
+ ],
1925
+ "evaluation_splits": [
1926
+ "test"
1927
+ ],
1928
+ "few_shots_split": "dev",
1929
+ "few_shots_select": "sequential",
1930
+ "generation_size": 1,
1931
+ "stop_sequence": [
1932
+ "\n"
1933
+ ],
1934
+ "output_regex": null,
1935
+ "frozen": false,
1936
+ "suite": [
1937
+ "lighteval",
1938
+ "mmlu"
1939
+ ],
1940
+ "original_num_docs": 612,
1941
+ "effective_num_docs": 612
1942
+ },
1943
+ "lighteval|mmlu:public_relations": {
1944
+ "name": "mmlu:public_relations",
1945
+ "prompt_function": "mmlu_harness",
1946
+ "hf_repo": "lighteval/mmlu",
1947
+ "hf_subset": "public_relations",
1948
+ "metric": [
1949
+ "loglikelihood_acc"
1950
+ ],
1951
+ "hf_avail_splits": [
1952
+ "auxiliary_train",
1953
+ "test",
1954
+ "validation",
1955
+ "dev"
1956
+ ],
1957
+ "evaluation_splits": [
1958
+ "test"
1959
+ ],
1960
+ "few_shots_split": "dev",
1961
+ "few_shots_select": "sequential",
1962
+ "generation_size": 1,
1963
+ "stop_sequence": [
1964
+ "\n"
1965
+ ],
1966
+ "output_regex": null,
1967
+ "frozen": false,
1968
+ "suite": [
1969
+ "lighteval",
1970
+ "mmlu"
1971
+ ],
1972
+ "original_num_docs": 110,
1973
+ "effective_num_docs": 110
1974
+ },
1975
+ "lighteval|mmlu:security_studies": {
1976
+ "name": "mmlu:security_studies",
1977
+ "prompt_function": "mmlu_harness",
1978
+ "hf_repo": "lighteval/mmlu",
1979
+ "hf_subset": "security_studies",
1980
+ "metric": [
1981
+ "loglikelihood_acc"
1982
+ ],
1983
+ "hf_avail_splits": [
1984
+ "auxiliary_train",
1985
+ "test",
1986
+ "validation",
1987
+ "dev"
1988
+ ],
1989
+ "evaluation_splits": [
1990
+ "test"
1991
+ ],
1992
+ "few_shots_split": "dev",
1993
+ "few_shots_select": "sequential",
1994
+ "generation_size": 1,
1995
+ "stop_sequence": [
1996
+ "\n"
1997
+ ],
1998
+ "output_regex": null,
1999
+ "frozen": false,
2000
+ "suite": [
2001
+ "lighteval",
2002
+ "mmlu"
2003
+ ],
2004
+ "original_num_docs": 245,
2005
+ "effective_num_docs": 245
2006
+ },
2007
+ "lighteval|mmlu:sociology": {
2008
+ "name": "mmlu:sociology",
2009
+ "prompt_function": "mmlu_harness",
2010
+ "hf_repo": "lighteval/mmlu",
2011
+ "hf_subset": "sociology",
2012
+ "metric": [
2013
+ "loglikelihood_acc"
2014
+ ],
2015
+ "hf_avail_splits": [
2016
+ "auxiliary_train",
2017
+ "test",
2018
+ "validation",
2019
+ "dev"
2020
+ ],
2021
+ "evaluation_splits": [
2022
+ "test"
2023
+ ],
2024
+ "few_shots_split": "dev",
2025
+ "few_shots_select": "sequential",
2026
+ "generation_size": 1,
2027
+ "stop_sequence": [
2028
+ "\n"
2029
+ ],
2030
+ "output_regex": null,
2031
+ "frozen": false,
2032
+ "suite": [
2033
+ "lighteval",
2034
+ "mmlu"
2035
+ ],
2036
+ "original_num_docs": 201,
2037
+ "effective_num_docs": 201
2038
+ },
2039
+ "lighteval|mmlu:us_foreign_policy": {
2040
+ "name": "mmlu:us_foreign_policy",
2041
+ "prompt_function": "mmlu_harness",
2042
+ "hf_repo": "lighteval/mmlu",
2043
+ "hf_subset": "us_foreign_policy",
2044
+ "metric": [
2045
+ "loglikelihood_acc"
2046
+ ],
2047
+ "hf_avail_splits": [
2048
+ "auxiliary_train",
2049
+ "test",
2050
+ "validation",
2051
+ "dev"
2052
+ ],
2053
+ "evaluation_splits": [
2054
+ "test"
2055
+ ],
2056
+ "few_shots_split": "dev",
2057
+ "few_shots_select": "sequential",
2058
+ "generation_size": 1,
2059
+ "stop_sequence": [
2060
+ "\n"
2061
+ ],
2062
+ "output_regex": null,
2063
+ "frozen": false,
2064
+ "suite": [
2065
+ "lighteval",
2066
+ "mmlu"
2067
+ ],
2068
+ "original_num_docs": 100,
2069
+ "effective_num_docs": 100
2070
+ },
2071
+ "lighteval|mmlu:virology": {
2072
+ "name": "mmlu:virology",
2073
+ "prompt_function": "mmlu_harness",
2074
+ "hf_repo": "lighteval/mmlu",
2075
+ "hf_subset": "virology",
2076
+ "metric": [
2077
+ "loglikelihood_acc"
2078
+ ],
2079
+ "hf_avail_splits": [
2080
+ "auxiliary_train",
2081
+ "test",
2082
+ "validation",
2083
+ "dev"
2084
+ ],
2085
+ "evaluation_splits": [
2086
+ "test"
2087
+ ],
2088
+ "few_shots_split": "dev",
2089
+ "few_shots_select": "sequential",
2090
+ "generation_size": 1,
2091
+ "stop_sequence": [
2092
+ "\n"
2093
+ ],
2094
+ "output_regex": null,
2095
+ "frozen": false,
2096
+ "suite": [
2097
+ "lighteval",
2098
+ "mmlu"
2099
+ ],
2100
+ "original_num_docs": 166,
2101
+ "effective_num_docs": 166
2102
+ },
2103
+ "lighteval|mmlu:world_religions": {
2104
+ "name": "mmlu:world_religions",
2105
+ "prompt_function": "mmlu_harness",
2106
+ "hf_repo": "lighteval/mmlu",
2107
+ "hf_subset": "world_religions",
2108
+ "metric": [
2109
+ "loglikelihood_acc"
2110
+ ],
2111
+ "hf_avail_splits": [
2112
+ "auxiliary_train",
2113
+ "test",
2114
+ "validation",
2115
+ "dev"
2116
+ ],
2117
+ "evaluation_splits": [
2118
+ "test"
2119
+ ],
2120
+ "few_shots_split": "dev",
2121
+ "few_shots_select": "sequential",
2122
+ "generation_size": 1,
2123
+ "stop_sequence": [
2124
+ "\n"
2125
+ ],
2126
+ "output_regex": null,
2127
+ "frozen": false,
2128
+ "suite": [
2129
+ "lighteval",
2130
+ "mmlu"
2131
+ ],
2132
+ "original_num_docs": 171,
2133
+ "effective_num_docs": 171
2134
+ }
2135
+ },
2136
+ "summary_tasks": {
2137
+ "lighteval|mmlu:abstract_algebra|5": {
2138
+ "hashes": {
2139
+ "hash_examples": "4c76229e00c9c0e9",
2140
+ "hash_full_prompts": "a45d01c3409c889c",
2141
+ "hash_input_tokens": "fc11398ca4e995e6",
2142
+ "hash_cont_tokens": "dadea1de19dee95c"
2143
+ },
2144
+ "truncated": 0,
2145
+ "non_truncated": 100,
2146
+ "padded": 400,
2147
+ "non_padded": 0,
2148
+ "effective_few_shots": 5.0,
2149
+ "num_truncated_few_shots": 0
2150
+ },
2151
+ "lighteval|mmlu:anatomy|5": {
2152
+ "hashes": {
2153
+ "hash_examples": "6a1f8104dccbd33b",
2154
+ "hash_full_prompts": "e245c6600e03cc32",
2155
+ "hash_input_tokens": "0e63aad739f5d777",
2156
+ "hash_cont_tokens": "96c2bab19c75f48d"
2157
+ },
2158
+ "truncated": 0,
2159
+ "non_truncated": 135,
2160
+ "padded": 540,
2161
+ "non_padded": 0,
2162
+ "effective_few_shots": 5.0,
2163
+ "num_truncated_few_shots": 0
2164
+ },
2165
+ "lighteval|mmlu:astronomy|5": {
2166
+ "hashes": {
2167
+ "hash_examples": "1302effa3a76ce4c",
2168
+ "hash_full_prompts": "390f9bddf857ad04",
2169
+ "hash_input_tokens": "53afd9483d456920",
2170
+ "hash_cont_tokens": "6cc2d6fb43989c46"
2171
+ },
2172
+ "truncated": 0,
2173
+ "non_truncated": 152,
2174
+ "padded": 608,
2175
+ "non_padded": 0,
2176
+ "effective_few_shots": 5.0,
2177
+ "num_truncated_few_shots": 0
2178
+ },
2179
+ "lighteval|mmlu:business_ethics|5": {
2180
+ "hashes": {
2181
+ "hash_examples": "03cb8bce5336419a",
2182
+ "hash_full_prompts": "5504f893bc4f2fa1",
2183
+ "hash_input_tokens": "1d0d99c2f7f95728",
2184
+ "hash_cont_tokens": "dadea1de19dee95c"
2185
+ },
2186
+ "truncated": 0,
2187
+ "non_truncated": 100,
2188
+ "padded": 400,
2189
+ "non_padded": 0,
2190
+ "effective_few_shots": 5.0,
2191
+ "num_truncated_few_shots": 0
2192
+ },
2193
+ "lighteval|mmlu:clinical_knowledge|5": {
2194
+ "hashes": {
2195
+ "hash_examples": "ffbb9c7b2be257f9",
2196
+ "hash_full_prompts": "106ad0bab4b90b78",
2197
+ "hash_input_tokens": "6abbbf267dbe9940",
2198
+ "hash_cont_tokens": "4566966a1e601b6c"
2199
+ },
2200
+ "truncated": 0,
2201
+ "non_truncated": 265,
2202
+ "padded": 1060,
2203
+ "non_padded": 0,
2204
+ "effective_few_shots": 5.0,
2205
+ "num_truncated_few_shots": 0
2206
+ },
2207
+ "lighteval|mmlu:college_biology|5": {
2208
+ "hashes": {
2209
+ "hash_examples": "3ee77f176f38eb8e",
2210
+ "hash_full_prompts": "59f9bdf2695cb226",
2211
+ "hash_input_tokens": "803196bfad4a393a",
2212
+ "hash_cont_tokens": "4ea00cd7b2f74799"
2213
+ },
2214
+ "truncated": 0,
2215
+ "non_truncated": 144,
2216
+ "padded": 576,
2217
+ "non_padded": 0,
2218
+ "effective_few_shots": 5.0,
2219
+ "num_truncated_few_shots": 0
2220
+ },
2221
+ "lighteval|mmlu:college_chemistry|5": {
2222
+ "hashes": {
2223
+ "hash_examples": "ce61a69c46d47aeb",
2224
+ "hash_full_prompts": "3cac9b759fcff7a0",
2225
+ "hash_input_tokens": "87bd9eea77de9a9a",
2226
+ "hash_cont_tokens": "dadea1de19dee95c"
2227
+ },
2228
+ "truncated": 0,
2229
+ "non_truncated": 100,
2230
+ "padded": 400,
2231
+ "non_padded": 0,
2232
+ "effective_few_shots": 5.0,
2233
+ "num_truncated_few_shots": 0
2234
+ },
2235
+ "lighteval|mmlu:college_computer_science|5": {
2236
+ "hashes": {
2237
+ "hash_examples": "32805b52d7d5daab",
2238
+ "hash_full_prompts": "010b0cca35070130",
2239
+ "hash_input_tokens": "b6775c67bfa0c782",
2240
+ "hash_cont_tokens": "dadea1de19dee95c"
2241
+ },
2242
+ "truncated": 0,
2243
+ "non_truncated": 100,
2244
+ "padded": 400,
2245
+ "non_padded": 0,
2246
+ "effective_few_shots": 5.0,
2247
+ "num_truncated_few_shots": 0
2248
+ },
2249
+ "lighteval|mmlu:college_mathematics|5": {
2250
+ "hashes": {
2251
+ "hash_examples": "55da1a0a0bd33722",
2252
+ "hash_full_prompts": "511422eb9eefc773",
2253
+ "hash_input_tokens": "cbd8a9d6bbda7b3c",
2254
+ "hash_cont_tokens": "dadea1de19dee95c"
2255
+ },
2256
+ "truncated": 0,
2257
+ "non_truncated": 100,
2258
+ "padded": 400,
2259
+ "non_padded": 0,
2260
+ "effective_few_shots": 5.0,
2261
+ "num_truncated_few_shots": 0
2262
+ },
2263
+ "lighteval|mmlu:college_medicine|5": {
2264
+ "hashes": {
2265
+ "hash_examples": "c33e143163049176",
2266
+ "hash_full_prompts": "c8cc1a82a51a046e",
2267
+ "hash_input_tokens": "b3c40eab0fb83731",
2268
+ "hash_cont_tokens": "aed3e7fd8adea27e"
2269
+ },
2270
+ "truncated": 0,
2271
+ "non_truncated": 173,
2272
+ "padded": 692,
2273
+ "non_padded": 0,
2274
+ "effective_few_shots": 5.0,
2275
+ "num_truncated_few_shots": 0
2276
+ },
2277
+ "lighteval|mmlu:college_physics|5": {
2278
+ "hashes": {
2279
+ "hash_examples": "ebdab1cdb7e555df",
2280
+ "hash_full_prompts": "e40721b5059c5818",
2281
+ "hash_input_tokens": "c69c0bfb74e99180",
2282
+ "hash_cont_tokens": "1ca37bb9b8be1c5d"
2283
+ },
2284
+ "truncated": 0,
2285
+ "non_truncated": 102,
2286
+ "padded": 408,
2287
+ "non_padded": 0,
2288
+ "effective_few_shots": 5.0,
2289
+ "num_truncated_few_shots": 0
2290
+ },
2291
+ "lighteval|mmlu:computer_security|5": {
2292
+ "hashes": {
2293
+ "hash_examples": "a24fd7d08a560921",
2294
+ "hash_full_prompts": "946c9be5964ac44a",
2295
+ "hash_input_tokens": "70914e4af05d09b4",
2296
+ "hash_cont_tokens": "dadea1de19dee95c"
2297
+ },
2298
+ "truncated": 0,
2299
+ "non_truncated": 100,
2300
+ "padded": 400,
2301
+ "non_padded": 0,
2302
+ "effective_few_shots": 5.0,
2303
+ "num_truncated_few_shots": 0
2304
+ },
2305
+ "lighteval|mmlu:conceptual_physics|5": {
2306
+ "hashes": {
2307
+ "hash_examples": "8300977a79386993",
2308
+ "hash_full_prompts": "506a4f6094cc40c9",
2309
+ "hash_input_tokens": "dcb90ef41648f505",
2310
+ "hash_cont_tokens": "26db9e6e7dfdac00"
2311
+ },
2312
+ "truncated": 0,
2313
+ "non_truncated": 235,
2314
+ "padded": 940,
2315
+ "non_padded": 0,
2316
+ "effective_few_shots": 5.0,
2317
+ "num_truncated_few_shots": 0
2318
+ },
2319
+ "lighteval|mmlu:econometrics|5": {
2320
+ "hashes": {
2321
+ "hash_examples": "ddde36788a04a46f",
2322
+ "hash_full_prompts": "4ed2703f27f1ed05",
2323
+ "hash_input_tokens": "ef8da4b8e9eb5a76",
2324
+ "hash_cont_tokens": "2ef49b394cfb87e1"
2325
+ },
2326
+ "truncated": 0,
2327
+ "non_truncated": 114,
2328
+ "padded": 456,
2329
+ "non_padded": 0,
2330
+ "effective_few_shots": 5.0,
2331
+ "num_truncated_few_shots": 0
2332
+ },
2333
+ "lighteval|mmlu:electrical_engineering|5": {
2334
+ "hashes": {
2335
+ "hash_examples": "acbc5def98c19b3f",
2336
+ "hash_full_prompts": "d8f4b3e11c23653c",
2337
+ "hash_input_tokens": "1a5e9d41be2d9981",
2338
+ "hash_cont_tokens": "adb5a1c5d57fbb41"
2339
+ },
2340
+ "truncated": 0,
2341
+ "non_truncated": 145,
2342
+ "padded": 580,
2343
+ "non_padded": 0,
2344
+ "effective_few_shots": 5.0,
2345
+ "num_truncated_few_shots": 0
2346
+ },
2347
+ "lighteval|mmlu:elementary_mathematics|5": {
2348
+ "hashes": {
2349
+ "hash_examples": "146e61d07497a9bd",
2350
+ "hash_full_prompts": "256d111bd15647ff",
2351
+ "hash_input_tokens": "e0d51d86d03e1394",
2352
+ "hash_cont_tokens": "d0782f141bcc895b"
2353
+ },
2354
+ "truncated": 0,
2355
+ "non_truncated": 378,
2356
+ "padded": 1512,
2357
+ "non_padded": 0,
2358
+ "effective_few_shots": 5.0,
2359
+ "num_truncated_few_shots": 0
2360
+ },
2361
+ "lighteval|mmlu:formal_logic|5": {
2362
+ "hashes": {
2363
+ "hash_examples": "8635216e1909a03f",
2364
+ "hash_full_prompts": "1171d04f3b1a11f5",
2365
+ "hash_input_tokens": "4c75b7f176e01a01",
2366
+ "hash_cont_tokens": "315a91fa1f805c93"
2367
+ },
2368
+ "truncated": 0,
2369
+ "non_truncated": 126,
2370
+ "padded": 504,
2371
+ "non_padded": 0,
2372
+ "effective_few_shots": 5.0,
2373
+ "num_truncated_few_shots": 0
2374
+ },
2375
+ "lighteval|mmlu:global_facts|5": {
2376
+ "hashes": {
2377
+ "hash_examples": "30b315aa6353ee47",
2378
+ "hash_full_prompts": "a7e56dbc074c7529",
2379
+ "hash_input_tokens": "b83cb180a97c221d",
2380
+ "hash_cont_tokens": "dadea1de19dee95c"
2381
+ },
2382
+ "truncated": 0,
2383
+ "non_truncated": 100,
2384
+ "padded": 400,
2385
+ "non_padded": 0,
2386
+ "effective_few_shots": 5.0,
2387
+ "num_truncated_few_shots": 0
2388
+ },
2389
+ "lighteval|mmlu:high_school_biology|5": {
2390
+ "hashes": {
2391
+ "hash_examples": "c9136373af2180de",
2392
+ "hash_full_prompts": "ad6e859ed978e04a",
2393
+ "hash_input_tokens": "179a2ab8e131445a",
2394
+ "hash_cont_tokens": "715bc46d18155135"
2395
+ },
2396
+ "truncated": 0,
2397
+ "non_truncated": 310,
2398
+ "padded": 1240,
2399
+ "non_padded": 0,
2400
+ "effective_few_shots": 5.0,
2401
+ "num_truncated_few_shots": 0
2402
+ },
2403
+ "lighteval|mmlu:high_school_chemistry|5": {
2404
+ "hashes": {
2405
+ "hash_examples": "b0661bfa1add6404",
2406
+ "hash_full_prompts": "6eb9c04bcc8a8f2a",
2407
+ "hash_input_tokens": "1e6a4441b61eb8f6",
2408
+ "hash_cont_tokens": "3d12f9b93cc609a2"
2409
+ },
2410
+ "truncated": 0,
2411
+ "non_truncated": 203,
2412
+ "padded": 812,
2413
+ "non_padded": 0,
2414
+ "effective_few_shots": 5.0,
2415
+ "num_truncated_few_shots": 0
2416
+ },
2417
+ "lighteval|mmlu:high_school_computer_science|5": {
2418
+ "hashes": {
2419
+ "hash_examples": "80fc1d623a3d665f",
2420
+ "hash_full_prompts": "8e51bc91c81cf8dd",
2421
+ "hash_input_tokens": "4df816916ded3a8c",
2422
+ "hash_cont_tokens": "dadea1de19dee95c"
2423
+ },
2424
+ "truncated": 0,
2425
+ "non_truncated": 100,
2426
+ "padded": 400,
2427
+ "non_padded": 0,
2428
+ "effective_few_shots": 5.0,
2429
+ "num_truncated_few_shots": 0
2430
+ },
2431
+ "lighteval|mmlu:high_school_european_history|5": {
2432
+ "hashes": {
2433
+ "hash_examples": "854da6e5af0fe1a1",
2434
+ "hash_full_prompts": "664a1f16c9f3195c",
2435
+ "hash_input_tokens": "317d565e995cda09",
2436
+ "hash_cont_tokens": "6d9c47e593859ccd"
2437
+ },
2438
+ "truncated": 0,
2439
+ "non_truncated": 165,
2440
+ "padded": 656,
2441
+ "non_padded": 4,
2442
+ "effective_few_shots": 5.0,
2443
+ "num_truncated_few_shots": 0
2444
+ },
2445
+ "lighteval|mmlu:high_school_geography|5": {
2446
+ "hashes": {
2447
+ "hash_examples": "7dc963c7acd19ad8",
2448
+ "hash_full_prompts": "f3acf911f4023c8a",
2449
+ "hash_input_tokens": "0f17bdb1600d33f7",
2450
+ "hash_cont_tokens": "84097c7fa87dfe61"
2451
+ },
2452
+ "truncated": 0,
2453
+ "non_truncated": 198,
2454
+ "padded": 792,
2455
+ "non_padded": 0,
2456
+ "effective_few_shots": 5.0,
2457
+ "num_truncated_few_shots": 0
2458
+ },
2459
+ "lighteval|mmlu:high_school_government_and_politics|5": {
2460
+ "hashes": {
2461
+ "hash_examples": "1f675dcdebc9758f",
2462
+ "hash_full_prompts": "066254feaa3158ae",
2463
+ "hash_input_tokens": "ac3cca039d98e159",
2464
+ "hash_cont_tokens": "86d43dfe026b5e6e"
2465
+ },
2466
+ "truncated": 0,
2467
+ "non_truncated": 193,
2468
+ "padded": 772,
2469
+ "non_padded": 0,
2470
+ "effective_few_shots": 5.0,
2471
+ "num_truncated_few_shots": 0
2472
+ },
2473
+ "lighteval|mmlu:high_school_macroeconomics|5": {
2474
+ "hashes": {
2475
+ "hash_examples": "2fb32cf2d80f0b35",
2476
+ "hash_full_prompts": "19a7fa502aa85c95",
2477
+ "hash_input_tokens": "3e795472fd70b8e9",
2478
+ "hash_cont_tokens": "99f5469b1de9a21b"
2479
+ },
2480
+ "truncated": 0,
2481
+ "non_truncated": 390,
2482
+ "padded": 1560,
2483
+ "non_padded": 0,
2484
+ "effective_few_shots": 5.0,
2485
+ "num_truncated_few_shots": 0
2486
+ },
2487
+ "lighteval|mmlu:high_school_mathematics|5": {
2488
+ "hashes": {
2489
+ "hash_examples": "fd6646fdb5d58a1f",
2490
+ "hash_full_prompts": "4f704e369778b5b0",
2491
+ "hash_input_tokens": "37e154ab071591d5",
2492
+ "hash_cont_tokens": "e215c84aa19ccb33"
2493
+ },
2494
+ "truncated": 0,
2495
+ "non_truncated": 270,
2496
+ "padded": 1078,
2497
+ "non_padded": 2,
2498
+ "effective_few_shots": 5.0,
2499
+ "num_truncated_few_shots": 0
2500
+ },
2501
+ "lighteval|mmlu:high_school_microeconomics|5": {
2502
+ "hashes": {
2503
+ "hash_examples": "2118f21f71d87d84",
2504
+ "hash_full_prompts": "4350f9e2240f8010",
2505
+ "hash_input_tokens": "02d65d5e1ee6dea9",
2506
+ "hash_cont_tokens": "dc8017437d84c710"
2507
+ },
2508
+ "truncated": 0,
2509
+ "non_truncated": 238,
2510
+ "padded": 952,
2511
+ "non_padded": 0,
2512
+ "effective_few_shots": 5.0,
2513
+ "num_truncated_few_shots": 0
2514
+ },
2515
+ "lighteval|mmlu:high_school_physics|5": {
2516
+ "hashes": {
2517
+ "hash_examples": "dc3ce06378548565",
2518
+ "hash_full_prompts": "5dc0d6831b66188f",
2519
+ "hash_input_tokens": "6f0c932d12edce11",
2520
+ "hash_cont_tokens": "b8152fcdcf86c673"
2521
+ },
2522
+ "truncated": 0,
2523
+ "non_truncated": 151,
2524
+ "padded": 596,
2525
+ "non_padded": 8,
2526
+ "effective_few_shots": 5.0,
2527
+ "num_truncated_few_shots": 0
2528
+ },
2529
+ "lighteval|mmlu:high_school_psychology|5": {
2530
+ "hashes": {
2531
+ "hash_examples": "c8d1d98a40e11f2f",
2532
+ "hash_full_prompts": "af2b097da6d50365",
2533
+ "hash_input_tokens": "0e444eb7ba0a1fb0",
2534
+ "hash_cont_tokens": "ac45cbb9009f81d9"
2535
+ },
2536
+ "truncated": 0,
2537
+ "non_truncated": 545,
2538
+ "padded": 2168,
2539
+ "non_padded": 12,
2540
+ "effective_few_shots": 5.0,
2541
+ "num_truncated_few_shots": 0
2542
+ },
2543
+ "lighteval|mmlu:high_school_statistics|5": {
2544
+ "hashes": {
2545
+ "hash_examples": "666c8759b98ee4ff",
2546
+ "hash_full_prompts": "c757694421d6d68d",
2547
+ "hash_input_tokens": "4e1485b614b2dc7f",
2548
+ "hash_cont_tokens": "9c9b68ee68272b16"
2549
+ },
2550
+ "truncated": 0,
2551
+ "non_truncated": 216,
2552
+ "padded": 864,
2553
+ "non_padded": 0,
2554
+ "effective_few_shots": 5.0,
2555
+ "num_truncated_few_shots": 0
2556
+ },
2557
+ "lighteval|mmlu:high_school_us_history|5": {
2558
+ "hashes": {
2559
+ "hash_examples": "95fef1c4b7d3f81e",
2560
+ "hash_full_prompts": "e34a028d0ddeec5e",
2561
+ "hash_input_tokens": "b836c43a53625ee3",
2562
+ "hash_cont_tokens": "cec285b624c15c10"
2563
+ },
2564
+ "truncated": 0,
2565
+ "non_truncated": 204,
2566
+ "padded": 816,
2567
+ "non_padded": 0,
2568
+ "effective_few_shots": 5.0,
2569
+ "num_truncated_few_shots": 0
2570
+ },
2571
+ "lighteval|mmlu:high_school_world_history|5": {
2572
+ "hashes": {
2573
+ "hash_examples": "7e5085b6184b0322",
2574
+ "hash_full_prompts": "1fa3d51392765601",
2575
+ "hash_input_tokens": "bb11d024e2405b72",
2576
+ "hash_cont_tokens": "2c02128f8f2f7539"
2577
+ },
2578
+ "truncated": 0,
2579
+ "non_truncated": 237,
2580
+ "padded": 948,
2581
+ "non_padded": 0,
2582
+ "effective_few_shots": 5.0,
2583
+ "num_truncated_few_shots": 0
2584
+ },
2585
+ "lighteval|mmlu:human_aging|5": {
2586
+ "hashes": {
2587
+ "hash_examples": "c17333e7c7c10797",
2588
+ "hash_full_prompts": "cac900721f9a1a94",
2589
+ "hash_input_tokens": "2a1e5a167a3788c9",
2590
+ "hash_cont_tokens": "faa94c4ec8e7be4e"
2591
+ },
2592
+ "truncated": 0,
2593
+ "non_truncated": 223,
2594
+ "padded": 892,
2595
+ "non_padded": 0,
2596
+ "effective_few_shots": 5.0,
2597
+ "num_truncated_few_shots": 0
2598
+ },
2599
+ "lighteval|mmlu:human_sexuality|5": {
2600
+ "hashes": {
2601
+ "hash_examples": "4edd1e9045df5e3d",
2602
+ "hash_full_prompts": "0d6567bafee0a13c",
2603
+ "hash_input_tokens": "73b98b906cf7ce7f",
2604
+ "hash_cont_tokens": "d642d34719fa5ff6"
2605
+ },
2606
+ "truncated": 0,
2607
+ "non_truncated": 131,
2608
+ "padded": 524,
2609
+ "non_padded": 0,
2610
+ "effective_few_shots": 5.0,
2611
+ "num_truncated_few_shots": 0
2612
+ },
2613
+ "lighteval|mmlu:international_law|5": {
2614
+ "hashes": {
2615
+ "hash_examples": "db2fa00d771a062a",
2616
+ "hash_full_prompts": "d018f9116479795e",
2617
+ "hash_input_tokens": "5f7cf71ef19fdf7d",
2618
+ "hash_cont_tokens": "f0d54717d3cdc783"
2619
+ },
2620
+ "truncated": 0,
2621
+ "non_truncated": 121,
2622
+ "padded": 484,
2623
+ "non_padded": 0,
2624
+ "effective_few_shots": 5.0,
2625
+ "num_truncated_few_shots": 0
2626
+ },
2627
+ "lighteval|mmlu:jurisprudence|5": {
2628
+ "hashes": {
2629
+ "hash_examples": "e956f86b124076fe",
2630
+ "hash_full_prompts": "1487e89a10ec58b7",
2631
+ "hash_input_tokens": "0f30607df3aa1190",
2632
+ "hash_cont_tokens": "d766ae8c3d361559"
2633
+ },
2634
+ "truncated": 0,
2635
+ "non_truncated": 108,
2636
+ "padded": 432,
2637
+ "non_padded": 0,
2638
+ "effective_few_shots": 5.0,
2639
+ "num_truncated_few_shots": 0
2640
+ },
2641
+ "lighteval|mmlu:logical_fallacies|5": {
2642
+ "hashes": {
2643
+ "hash_examples": "956e0e6365ab79f1",
2644
+ "hash_full_prompts": "677785b2181f9243",
2645
+ "hash_input_tokens": "ac2bcfdf302d6dcd",
2646
+ "hash_cont_tokens": "0fcca855210b4243"
2647
+ },
2648
+ "truncated": 0,
2649
+ "non_truncated": 163,
2650
+ "padded": 652,
2651
+ "non_padded": 0,
2652
+ "effective_few_shots": 5.0,
2653
+ "num_truncated_few_shots": 0
2654
+ },
2655
+ "lighteval|mmlu:machine_learning|5": {
2656
+ "hashes": {
2657
+ "hash_examples": "397997cc6f4d581e",
2658
+ "hash_full_prompts": "769ee14a2aea49bb",
2659
+ "hash_input_tokens": "3d634b614f766363",
2660
+ "hash_cont_tokens": "8b369a2ff9235b9d"
2661
+ },
2662
+ "truncated": 0,
2663
+ "non_truncated": 112,
2664
+ "padded": 448,
2665
+ "non_padded": 0,
2666
+ "effective_few_shots": 5.0,
2667
+ "num_truncated_few_shots": 0
2668
+ },
2669
+ "lighteval|mmlu:management|5": {
2670
+ "hashes": {
2671
+ "hash_examples": "2bcbe6f6ca63d740",
2672
+ "hash_full_prompts": "cb1ff9dac9582144",
2673
+ "hash_input_tokens": "d2728b0835c2fa6d",
2674
+ "hash_cont_tokens": "c77ad5f59321afa5"
2675
+ },
2676
+ "truncated": 0,
2677
+ "non_truncated": 103,
2678
+ "padded": 412,
2679
+ "non_padded": 0,
2680
+ "effective_few_shots": 5.0,
2681
+ "num_truncated_few_shots": 0
2682
+ },
2683
+ "lighteval|mmlu:marketing|5": {
2684
+ "hashes": {
2685
+ "hash_examples": "8ddb20d964a1b065",
2686
+ "hash_full_prompts": "9fc2114a187ad9a2",
2687
+ "hash_input_tokens": "9472fa5111070553",
2688
+ "hash_cont_tokens": "c94db408fe712d9b"
2689
+ },
2690
+ "truncated": 0,
2691
+ "non_truncated": 234,
2692
+ "padded": 936,
2693
+ "non_padded": 0,
2694
+ "effective_few_shots": 5.0,
2695
+ "num_truncated_few_shots": 0
2696
+ },
2697
+ "lighteval|mmlu:medical_genetics|5": {
2698
+ "hashes": {
2699
+ "hash_examples": "182a71f4763d2cea",
2700
+ "hash_full_prompts": "46a616fa51878959",
2701
+ "hash_input_tokens": "53f9c4977b0be4e0",
2702
+ "hash_cont_tokens": "dadea1de19dee95c"
2703
+ },
2704
+ "truncated": 0,
2705
+ "non_truncated": 100,
2706
+ "padded": 400,
2707
+ "non_padded": 0,
2708
+ "effective_few_shots": 5.0,
2709
+ "num_truncated_few_shots": 0
2710
+ },
2711
+ "lighteval|mmlu:miscellaneous|5": {
2712
+ "hashes": {
2713
+ "hash_examples": "4c404fdbb4ca57fc",
2714
+ "hash_full_prompts": "0813e1be36dbaae1",
2715
+ "hash_input_tokens": "fca7aac8daf1d0c7",
2716
+ "hash_cont_tokens": "60215a6f77eaf4d9"
2717
+ },
2718
+ "truncated": 0,
2719
+ "non_truncated": 783,
2720
+ "padded": 3132,
2721
+ "non_padded": 0,
2722
+ "effective_few_shots": 5.0,
2723
+ "num_truncated_few_shots": 0
2724
+ },
2725
+ "lighteval|mmlu:moral_disputes|5": {
2726
+ "hashes": {
2727
+ "hash_examples": "60cbd2baa3fea5c9",
2728
+ "hash_full_prompts": "1d14adebb9b62519",
2729
+ "hash_input_tokens": "e06669b20b6dba74",
2730
+ "hash_cont_tokens": "3ca55f92255c9f21"
2731
+ },
2732
+ "truncated": 0,
2733
+ "non_truncated": 346,
2734
+ "padded": 1384,
2735
+ "non_padded": 0,
2736
+ "effective_few_shots": 5.0,
2737
+ "num_truncated_few_shots": 0
2738
+ },
2739
+ "lighteval|mmlu:moral_scenarios|5": {
2740
+ "hashes": {
2741
+ "hash_examples": "fd8b0431fbdd75ef",
2742
+ "hash_full_prompts": "b80d3d236165e3de",
2743
+ "hash_input_tokens": "d22a130cb0ce4eec",
2744
+ "hash_cont_tokens": "a82e76a0738dc6ac"
2745
+ },
2746
+ "truncated": 0,
2747
+ "non_truncated": 895,
2748
+ "padded": 3551,
2749
+ "non_padded": 29,
2750
+ "effective_few_shots": 5.0,
2751
+ "num_truncated_few_shots": 0
2752
+ },
2753
+ "lighteval|mmlu:nutrition|5": {
2754
+ "hashes": {
2755
+ "hash_examples": "71e55e2b829b6528",
2756
+ "hash_full_prompts": "2bfb18e5fab8dea7",
2757
+ "hash_input_tokens": "6213f514742fc41d",
2758
+ "hash_cont_tokens": "b683842a2cf7cdd6"
2759
+ },
2760
+ "truncated": 0,
2761
+ "non_truncated": 306,
2762
+ "padded": 1224,
2763
+ "non_padded": 0,
2764
+ "effective_few_shots": 5.0,
2765
+ "num_truncated_few_shots": 0
2766
+ },
2767
+ "lighteval|mmlu:philosophy|5": {
2768
+ "hashes": {
2769
+ "hash_examples": "a6d489a8d208fa4b",
2770
+ "hash_full_prompts": "e8c0d5b6dae3ccc8",
2771
+ "hash_input_tokens": "99ddb7e2f24852cc",
2772
+ "hash_cont_tokens": "a545f25ae279a135"
2773
+ },
2774
+ "truncated": 0,
2775
+ "non_truncated": 311,
2776
+ "padded": 1244,
2777
+ "non_padded": 0,
2778
+ "effective_few_shots": 5.0,
2779
+ "num_truncated_few_shots": 0
2780
+ },
2781
+ "lighteval|mmlu:prehistory|5": {
2782
+ "hashes": {
2783
+ "hash_examples": "6cc50f032a19acaa",
2784
+ "hash_full_prompts": "4a6a1d3ab1bf28e4",
2785
+ "hash_input_tokens": "246ab4e3ab88967a",
2786
+ "hash_cont_tokens": "5a5ebca069b16663"
2787
+ },
2788
+ "truncated": 0,
2789
+ "non_truncated": 324,
2790
+ "padded": 1268,
2791
+ "non_padded": 28,
2792
+ "effective_few_shots": 5.0,
2793
+ "num_truncated_few_shots": 0
2794
+ },
2795
+ "lighteval|mmlu:professional_accounting|5": {
2796
+ "hashes": {
2797
+ "hash_examples": "50f57ab32f5f6cea",
2798
+ "hash_full_prompts": "e60129bd2d82ffc6",
2799
+ "hash_input_tokens": "aaeb137f42b60e30",
2800
+ "hash_cont_tokens": "e45018e60164d208"
2801
+ },
2802
+ "truncated": 0,
2803
+ "non_truncated": 282,
2804
+ "padded": 1120,
2805
+ "non_padded": 8,
2806
+ "effective_few_shots": 5.0,
2807
+ "num_truncated_few_shots": 0
2808
+ },
2809
+ "lighteval|mmlu:professional_law|5": {
2810
+ "hashes": {
2811
+ "hash_examples": "a8fdc85c64f4b215",
2812
+ "hash_full_prompts": "0dbb1d9b72dcea03",
2813
+ "hash_input_tokens": "a4dd0c29f47b7e84",
2814
+ "hash_cont_tokens": "b11002d08c03f837"
2815
+ },
2816
+ "truncated": 0,
2817
+ "non_truncated": 1534,
2818
+ "padded": 6136,
2819
+ "non_padded": 0,
2820
+ "effective_few_shots": 5.0,
2821
+ "num_truncated_few_shots": 0
2822
+ },
2823
+ "lighteval|mmlu:professional_medicine|5": {
2824
+ "hashes": {
2825
+ "hash_examples": "c373a28a3050a73a",
2826
+ "hash_full_prompts": "5e040f9ca68b089e",
2827
+ "hash_input_tokens": "4e14a4f7fcb794ad",
2828
+ "hash_cont_tokens": "11ce4c2ab1132810"
2829
+ },
2830
+ "truncated": 0,
2831
+ "non_truncated": 272,
2832
+ "padded": 1088,
2833
+ "non_padded": 0,
2834
+ "effective_few_shots": 5.0,
2835
+ "num_truncated_few_shots": 0
2836
+ },
2837
+ "lighteval|mmlu:professional_psychology|5": {
2838
+ "hashes": {
2839
+ "hash_examples": "bf5254fe818356af",
2840
+ "hash_full_prompts": "b386ecda8b87150e",
2841
+ "hash_input_tokens": "d81a045694559382",
2842
+ "hash_cont_tokens": "3835bfc898aacaa0"
2843
+ },
2844
+ "truncated": 0,
2845
+ "non_truncated": 612,
2846
+ "padded": 2448,
2847
+ "non_padded": 0,
2848
+ "effective_few_shots": 5.0,
2849
+ "num_truncated_few_shots": 0
2850
+ },
2851
+ "lighteval|mmlu:public_relations|5": {
2852
+ "hashes": {
2853
+ "hash_examples": "b66d52e28e7d14e0",
2854
+ "hash_full_prompts": "fe43562263e25677",
2855
+ "hash_input_tokens": "1d492df812b3c419",
2856
+ "hash_cont_tokens": "1692112db1aec618"
2857
+ },
2858
+ "truncated": 0,
2859
+ "non_truncated": 110,
2860
+ "padded": 440,
2861
+ "non_padded": 0,
2862
+ "effective_few_shots": 5.0,
2863
+ "num_truncated_few_shots": 0
2864
+ },
2865
+ "lighteval|mmlu:security_studies|5": {
2866
+ "hashes": {
2867
+ "hash_examples": "514c14feaf000ad9",
2868
+ "hash_full_prompts": "27d4a2ac541ef4b9",
2869
+ "hash_input_tokens": "edb25052e8b3c231",
2870
+ "hash_cont_tokens": "9801a1ce7f762a8b"
2871
+ },
2872
+ "truncated": 0,
2873
+ "non_truncated": 245,
2874
+ "padded": 980,
2875
+ "non_padded": 0,
2876
+ "effective_few_shots": 5.0,
2877
+ "num_truncated_few_shots": 0
2878
+ },
2879
+ "lighteval|mmlu:sociology|5": {
2880
+ "hashes": {
2881
+ "hash_examples": "f6c9bc9d18c80870",
2882
+ "hash_full_prompts": "c072ea7d1a1524f2",
2883
+ "hash_input_tokens": "d10e1fc02e9bb000",
2884
+ "hash_cont_tokens": "277e7d5b38c0960d"
2885
+ },
2886
+ "truncated": 0,
2887
+ "non_truncated": 201,
2888
+ "padded": 804,
2889
+ "non_padded": 0,
2890
+ "effective_few_shots": 5.0,
2891
+ "num_truncated_few_shots": 0
2892
+ },
2893
+ "lighteval|mmlu:us_foreign_policy|5": {
2894
+ "hashes": {
2895
+ "hash_examples": "ed7b78629db6678f",
2896
+ "hash_full_prompts": "341a97ca3e4d699d",
2897
+ "hash_input_tokens": "357e68691f7bb5be",
2898
+ "hash_cont_tokens": "dadea1de19dee95c"
2899
+ },
2900
+ "truncated": 0,
2901
+ "non_truncated": 100,
2902
+ "padded": 397,
2903
+ "non_padded": 3,
2904
+ "effective_few_shots": 5.0,
2905
+ "num_truncated_few_shots": 0
2906
+ },
2907
+ "lighteval|mmlu:virology|5": {
2908
+ "hashes": {
2909
+ "hash_examples": "bc52ffdc3f9b994a",
2910
+ "hash_full_prompts": "651d471e2eb8b5e9",
2911
+ "hash_input_tokens": "b38fa14ee2b9cc9d",
2912
+ "hash_cont_tokens": "a4a0852e6fb42244"
2913
+ },
2914
+ "truncated": 0,
2915
+ "non_truncated": 166,
2916
+ "padded": 664,
2917
+ "non_padded": 0,
2918
+ "effective_few_shots": 5.0,
2919
+ "num_truncated_few_shots": 0
2920
+ },
2921
+ "lighteval|mmlu:world_religions|5": {
2922
+ "hashes": {
2923
+ "hash_examples": "ecdb4a4f94f62930",
2924
+ "hash_full_prompts": "3773f03542ce44a3",
2925
+ "hash_input_tokens": "e2e0b330ff7c67d5",
2926
+ "hash_cont_tokens": "c96f2973fdf12010"
2927
+ },
2928
+ "truncated": 0,
2929
+ "non_truncated": 171,
2930
+ "padded": 684,
2931
+ "non_padded": 0,
2932
+ "effective_few_shots": 5.0,
2933
+ "num_truncated_few_shots": 0
2934
+ }
2935
+ },
2936
+ "summary_general": {
2937
+ "hashes": {
2938
+ "hash_examples": "341a076d0beb7048",
2939
+ "hash_full_prompts": "a5c8f2b7ff4f5ae2",
2940
+ "hash_input_tokens": "7d5d2fb20602eddc",
2941
+ "hash_cont_tokens": "28aa09e44eee2d3e"
2942
+ },
2943
+ "truncated": 0,
2944
+ "non_truncated": 14042,
2945
+ "padded": 56074,
2946
+ "non_padded": 94,
2947
+ "num_truncated_few_shots": 0
2948
+ }
2949
+ }