noneUsername commited on
Commit
bc6991a
·
verified ·
1 Parent(s): 699502b

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +199 -0
README.md ADDED
@@ -0,0 +1,199 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model:
3
+ - RekaAI/reka-flash-3
4
+ ---
5
+ vllm (pretrained=/root/autodl-tmp/reka-flash-3,add_bos_token=true,max_model_len=4096,dtype=bfloat16), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto
6
+ |Tasks|Version| Filter |n-shot| Metric | |Value| |Stderr|
7
+ |-----|------:|----------------|-----:|-----------|---|----:|---|-----:|
8
+ |gsm8k| 3|flexible-extract| 5|exact_match|↑ |0.720|± |0.0285|
9
+ | | |strict-match | 5|exact_match|↑ |0.676|± |0.0297|
10
+
11
+ vllm (pretrained=/root/autodl-tmp/reka-flash-3,add_bos_token=true,max_model_len=4096,dtype=bfloat16), gen_kwargs: (None), limit: 500.0, num_fewshot: 5, batch_size: auto
12
+ |Tasks|Version| Filter |n-shot| Metric | |Value| |Stderr|
13
+ |-----|------:|----------------|-----:|-----------|---|----:|---|-----:|
14
+ |gsm8k| 3|flexible-extract| 5|exact_match|↑ |0.724|± |0.0200|
15
+ | | |strict-match | 5|exact_match|↑ |0.684|± |0.0208|
16
+
17
+ vllm (pretrained=/root/autodl-tmp/reka-flash-3,add_bos_token=true,max_model_len=4096,dtype=bfloat16,max_num_seqs=3), gen_kwargs: (None), limit: 15.0, num_fewshot: None, batch_size: 1
18
+ | Groups |Version|Filter|n-shot|Metric| |Value | |Stderr|
19
+ |------------------|------:|------|------|------|---|-----:|---|-----:|
20
+ |mmlu | 2|none | |acc |↑ |0.6480|± |0.0158|
21
+ | - humanities | 2|none | |acc |↑ |0.6615|± |0.0328|
22
+ | - other | 2|none | |acc |↑ |0.6667|± |0.0328|
23
+ | - social sciences| 2|none | |acc |↑ |0.7167|± |0.0334|
24
+ | - stem | 2|none | |acc |↑ |0.5825|± |0.0284|
25
+
26
+
27
+ vllm (pretrained=/root/autodl-tmp/84-512,add_bos_token=true,max_model_len=4096,dtype=bfloat16), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto
28
+ |Tasks|Version| Filter |n-shot| Metric | |Value| |Stderr|
29
+ |-----|------:|----------------|-----:|-----------|---|----:|---|-----:|
30
+ |gsm8k| 3|flexible-extract| 5|exact_match|↑ |0.700|± |0.0290|
31
+ | | |strict-match | 5|exact_match|↑ |0.648|± |0.0303|
32
+
33
+ vllm (pretrained=/root/autodl-tmp/84-512,add_bos_token=true,max_model_len=4096,dtype=bfloat16), gen_kwargs: (None), limit: 500.0, num_fewshot: 5, batch_size: auto
34
+ |Tasks|Version| Filter |n-shot| Metric | |Value| |Stderr|
35
+ |-----|------:|----------------|-----:|-----------|---|----:|---|-----:|
36
+ |gsm8k| 3|flexible-extract| 5|exact_match|↑ |0.692|± |0.0207|
37
+ | | |strict-match | 5|exact_match|↑ |0.648|± |0.0214|
38
+
39
+ vllm (pretrained=/root/autodl-tmp/84-512,add_bos_token=true,max_model_len=4096,dtype=bfloat16,max_num_seqs=3), gen_kwargs: (None), limit: 15.0, num_fewshot: None, batch_size: 1
40
+ | Groups |Version|Filter|n-shot|Metric| |Value | |Stderr|
41
+ |------------------|------:|------|------|------|---|-----:|---|-----:|
42
+ |mmlu | 2|none | |acc |↑ |0.6515|± |0.0159|
43
+ | - humanities | 2|none | |acc |↑ |0.6718|± |0.0325|
44
+ | - other | 2|none | |acc |↑ |0.6718|± |0.0328|
45
+ | - social sciences| 2|none | |acc |↑ |0.7056|± |0.0341|
46
+ | - stem | 2|none | |acc |↑ |0.5895|± |0.0286|
47
+
48
+
49
+ vllm (pretrained=/root/autodl-tmp/848-128,add_bos_token=true,max_model_len=4096,dtype=bfloat16), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto
50
+ |Tasks|Version| Filter |n-shot| Metric | |Value| |Stderr|
51
+ |-----|------:|----------------|-----:|-----------|---|----:|---|-----:|
52
+ |gsm8k| 3|flexible-extract| 5|exact_match|↑ |0.692|± |0.0293|
53
+ | | |strict-match | 5|exact_match|↑ |0.660|± |0.0300|
54
+
55
+ vllm (pretrained=/root/autodl-tmp/848-128,add_bos_token=true,max_model_len=4096,dtype=bfloat16), gen_kwargs: (None), limit: 500.0, num_fewshot: 5, batch_size: auto
56
+ |Tasks|Version| Filter |n-shot| Metric | |Value| |Stderr|
57
+ |-----|------:|----------------|-----:|-----------|---|----:|---|-----:|
58
+ |gsm8k| 3|flexible-extract| 5|exact_match|↑ |0.724|± | 0.020|
59
+ | | |strict-match | 5|exact_match|↑ |0.674|± | 0.021|
60
+
61
+ vllm (pretrained=/root/autodl-tmp/848-128,add_bos_token=true,max_model_len=4096,dtype=bfloat16,max_num_seqs=3), gen_kwargs: (None), limit: 15.0, num_fewshot: None, batch_size: 1
62
+ | Groups |Version|Filter|n-shot|Metric| |Value | |Stderr|
63
+ |------------------|------:|------|------|------|---|-----:|---|-----:|
64
+ |mmlu | 2|none | |acc |↑ |0.6398|± |0.0159|
65
+ | - humanities | 2|none | |acc |↑ |0.6513|± |0.0333|
66
+ | - other | 2|none | |acc |↑ |0.6564|± |0.0330|
67
+ | - social sciences| 2|none | |acc |↑ |0.7222|± |0.0333|
68
+ | - stem | 2|none | |acc |↑ |0.5684|± |0.0284|
69
+
70
+
71
+ vllm (pretrained=/root/autodl-tmp/8485-512,add_bos_token=true,max_model_len=4096,dtype=bfloat16), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto
72
+ |Tasks|Version| Filter |n-shot| Metric | |Value| |Stderr|
73
+ |-----|------:|----------------|-----:|-----------|---|----:|---|-----:|
74
+ |gsm8k| 3|flexible-extract| 5|exact_match|↑ |0.732|± |0.0281|
75
+ | | |strict-match | 5|exact_match|↑ |0.696|± |0.0292|
76
+
77
+ vllm (pretrained=/root/autodl-tmp/8485-512,add_bos_token=true,max_model_len=4096,dtype=bfloat16), gen_kwargs: (None), limit: 500.0, num_fewshot: 5, batch_size: auto
78
+ |Tasks|Version| Filter |n-shot| Metric | |Value| |Stderr|
79
+ |-----|------:|----------------|-----:|-----------|---|----:|---|-----:|
80
+ |gsm8k| 3|flexible-extract| 5|exact_match|↑ |0.720|± |0.0201|
81
+ | | |strict-match | 5|exact_match|↑ |0.692|± |0.0207|
82
+
83
+ vllm (pretrained=/root/autodl-tmp/8485-512,add_bos_token=true,max_model_len=4096,dtype=bfloat16,max_num_seqs=3), gen_kwargs: (None), limit: 15.0, num_fewshot: None, batch_size: 1
84
+ | Groups |Version|Filter|n-shot|Metric| |Value | |Stderr|
85
+ |------------------|------:|------|------|------|---|-----:|---|-----:|
86
+ |mmlu | 2|none | |acc |↑ |0.6550|± |0.0158|
87
+ | - humanities | 2|none | |acc |↑ |0.6872|± |0.0323|
88
+ | - other | 2|none | |acc |↑ |0.6769|± |0.0327|
89
+ | - social sciences| 2|none | |acc |↑ |0.7056|± |0.0341|
90
+ | - stem | 2|none | |acc |↑ |0.5860|± |0.0284|
91
+
92
+
93
+ vllm (pretrained=/root/autodl-tmp/8485-128,add_bos_token=true,max_model_len=4096,dtype=bfloat16), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto
94
+ |Tasks|Version| Filter |n-shot| Metric | |Value| |Stderr|
95
+ |-----|------:|----------------|-----:|-----------|---|----:|---|-----:|
96
+ |gsm8k| 3|flexible-extract| 5|exact_match|↑ | 0.74|± |0.0278|
97
+ | | |strict-match | 5|exact_match|↑ | 0.68|± |0.0296|
98
+
99
+ vllm (pretrained=/root/autodl-tmp/8485-128,add_bos_token=true,max_model_len=4096,dtype=bfloat16), gen_kwargs: (None), limit: 500.0, num_fewshot: 5, batch_size: auto
100
+ |Tasks|Version| Filter |n-shot| Metric | |Value| |Stderr|
101
+ |-----|------:|----------------|-----:|-----------|---|----:|---|-----:|
102
+ |gsm8k| 3|flexible-extract| 5|exact_match|↑ |0.714|± |0.0202|
103
+ | | |strict-match | 5|exact_match|↑ |0.676|± |0.0210|
104
+
105
+ vllm (pretrained=/root/autodl-tmp/8485-128,add_bos_token=true,max_model_len=4096,dtype=bfloat16,max_num_seqs=3), gen_kwargs: (None), limit: 15.0, num_fewshot: None, batch_size: 1
106
+ | Groups |Version|Filter|n-shot|Metric| |Value | |Stderr|
107
+ |------------------|------:|------|------|------|---|-----:|---|-----:|
108
+ |mmlu | 2|none | |acc |↑ |0.6433|± |0.0160|
109
+ | - humanities | 2|none | |acc |↑ |0.6513|± |0.0337|
110
+ | - other | 2|none | |acc |↑ |0.6615|± |0.0332|
111
+ | - social sciences| 2|none | |acc |↑ |0.7111|± |0.0338|
112
+ | - stem | 2|none | |acc |↑ |0.5825|± |0.0284|
113
+
114
+
115
+ vllm (pretrained=/root/autodl-tmp/85-128,add_bos_token=true,max_model_len=4096,dtype=bfloat16), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto
116
+ |Tasks|Version| Filter |n-shot| Metric | |Value| |Stderr|
117
+ |-----|------:|----------------|-----:|-----------|---|----:|---|-----:|
118
+ |gsm8k| 3|flexible-extract| 5|exact_match|↑ |0.696|± |0.0292|
119
+ | | |strict-match | 5|exact_match|↑ |0.648|± |0.0303|
120
+
121
+ vllm (pretrained=/root/autodl-tmp/85-128,add_bos_token=true,max_model_len=4096,dtype=bfloat16), gen_kwargs: (None), limit: 500.0, num_fewshot: 5, batch_size: auto
122
+ |Tasks|Version| Filter |n-shot| Metric | |Value| |Stderr|
123
+ |-----|------:|----------------|-----:|-----------|---|----:|---|-----:|
124
+ |gsm8k| 3|flexible-extract| 5|exact_match|↑ |0.708|± |0.0204|
125
+ | | |strict-match | 5|exact_match|↑ |0.660|± |0.0212|
126
+
127
+ vllm (pretrained=/root/autodl-tmp/85-128,add_bos_token=true,max_model_len=4096,dtype=bfloat16,max_num_seqs=3), gen_kwargs: (None), limit: 15.0, num_fewshot: None, batch_size: 1
128
+ | Groups |Version|Filter|n-shot|Metric| |Value | |Stderr|
129
+ |------------------|------:|------|------|------|---|-----:|---|-----:|
130
+ |mmlu | 2|none | |acc |↑ |0.6526|± |0.0158|
131
+ | - humanities | 2|none | |acc |↑ |0.6615|± |0.0331|
132
+ | - other | 2|none | |acc |↑ |0.6769|± |0.0325|
133
+ | - social sciences| 2|none | |acc |↑ |0.7389|± |0.0327|
134
+ | - stem | 2|none | |acc |↑ |0.5754|± |0.0287|
135
+
136
+
137
+ vllm (pretrained=/root/autodl-tmp/85-512,add_bos_token=true,max_model_len=4096,dtype=bfloat16), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto
138
+ |Tasks|Version| Filter |n-shot| Metric | |Value| |Stderr|
139
+ |-----|------:|----------------|-----:|-----------|---|----:|---|-----:|
140
+ |gsm8k| 3|flexible-extract| 5|exact_match|↑ |0.708|± |0.0288|
141
+ | | |strict-match | 5|exact_match|↑ |0.648|± |0.0303|
142
+
143
+ vllm (pretrained=/root/autodl-tmp/85-512,add_bos_token=true,max_model_len=4096,dtype=bfloat16), gen_kwargs: (None), limit: 500.0, num_fewshot: 5, batch_size: auto
144
+ |Tasks|Version| Filter |n-shot| Metric | |Value| |Stderr|
145
+ |-----|------:|----------------|-----:|-----------|---|----:|---|-----:|
146
+ |gsm8k| 3|flexible-extract| 5|exact_match|↑ |0.720|± |0.0201|
147
+ | | |strict-match | 5|exact_match|↑ |0.658|± |0.0212|
148
+
149
+ vllm (pretrained=/root/autodl-tmp/85-512,add_bos_token=true,max_model_len=4096,dtype=bfloat16,max_num_seqs=3), gen_kwargs: (None), limit: 15.0, num_fewshot: None, batch_size: 1
150
+ | Groups |Version|Filter|n-shot|Metric| |Value | |Stderr|
151
+ |------------------|------:|------|------|------|---|-----:|---|-----:|
152
+ |mmlu | 2|none | |acc |↑ |0.6550|± |0.0158|
153
+ | - humanities | 2|none | |acc |↑ |0.6769|± |0.0324|
154
+ | - other | 2|none | |acc |↑ |0.6667|± |0.0331|
155
+ | - social sciences| 2|none | |acc |↑ |0.7278|± |0.0329|
156
+ | - stem | 2|none | |acc |↑ |0.5860|± |0.0284|
157
+
158
+
159
+ vllm (pretrained=/root/autodl-tmp/86-128,add_bos_token=true,max_model_len=4096,dtype=bfloat16), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto
160
+ |Tasks|Version| Filter |n-shot| Metric | |Value| |Stderr|
161
+ |-----|------:|----------------|-----:|-----------|---|----:|---|-----:|
162
+ |gsm8k| 3|flexible-extract| 5|exact_match|↑ |0.696|± |0.0292|
163
+ | | |strict-match | 5|exact_match|↑ |0.636|± |0.0305|
164
+
165
+ vllm (pretrained=/root/autodl-tmp/86-128,add_bos_token=true,max_model_len=4096,dtype=bfloat16), gen_kwargs: (None), limit: 500.0, num_fewshot: 5, batch_size: auto
166
+ |Tasks|Version| Filter |n-shot| Metric | |Value| |Stderr|
167
+ |-----|------:|----------------|-----:|-----------|---|----:|---|-----:|
168
+ |gsm8k| 3|flexible-extract| 5|exact_match|↑ |0.690|± |0.0207|
169
+ | | |strict-match | 5|exact_match|↑ |0.648|± |0.0214|
170
+
171
+ vllm (pretrained=/root/autodl-tmp/86-128,add_bos_token=true,max_model_len=4096,dtype=bfloat16,max_num_seqs=3), gen_kwargs: (None), limit: 15.0, num_fewshot: None, batch_size: 1
172
+ | Groups |Version|Filter|n-shot|Metric| |Value | |Stderr|
173
+ |------------------|------:|------|------|------|---|-----:|---|-----:|
174
+ |mmlu | 2|none | |acc |↑ |0.6398|± |0.0160|
175
+ | - humanities | 2|none | |acc |↑ |0.6410|± |0.0336|
176
+ | - other | 2|none | |acc |↑ |0.6564|± |0.0332|
177
+ | - social sciences| 2|none | |acc |↑ |0.7278|± |0.0333|
178
+ | - stem | 2|none | |acc |↑ |0.5719|± |0.0284|
179
+
180
+
181
+ vllm (pretrained=/root/autodl-tmp/86-512,add_bos_token=true,max_model_len=4096,dtype=bfloat16), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto
182
+ |Tasks|Version| Filter |n-shot| Metric | |Value| |Stderr|
183
+ |-----|------:|----------------|-----:|-----------|---|----:|---|-----:|
184
+ |gsm8k| 3|flexible-extract| 5|exact_match|↑ |0.688|± |0.0294|
185
+ | | |strict-match | 5|exact_match|↑ |0.640|± |0.0304|
186
+
187
+ vllm (pretrained=/root/autodl-tmp/86-512,add_bos_token=true,max_model_len=4096,dtype=bfloat16), gen_kwargs: (None), limit: 500.0, num_fewshot: 5, batch_size: auto
188
+ |Tasks|Version| Filter |n-shot| Metric | |Value| |Stderr|
189
+ |-----|------:|----------------|-----:|-----------|---|----:|---|-----:|
190
+ |gsm8k| 3|flexible-extract| 5|exact_match|↑ |0.706|± |0.0204|
191
+ | | |strict-match | 5|exact_match|↑ |0.660|± |0.0212|
192
+
193
+ | Groups |Version|Filter|n-shot|Metric| |Value | |Stderr|
194
+ |------------------|------:|------|------|------|---|-----:|---|-----:|
195
+ |mmlu | 2|none | |acc |↑ |0.6526|± |0.0158|
196
+ | - humanities | 2|none | |acc |↑ |0.6821|± |0.0327|
197
+ | - other | 2|none | |acc |↑ |0.6615|± |0.0331|
198
+ | - social sciences| 2|none | |acc |↑ |0.7278|± |0.0329|
199
+ | - stem | 2|none | |acc |↑ |0.5789|± |0.0284|