noneUsername commited on
Commit
9062093
·
verified ·
1 Parent(s): a2f987b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +51 -1
README.md CHANGED
@@ -39,4 +39,54 @@ vllm (pretrained=/root/autodl-tmp/phi-4-abliterated-869,add_bos_token=true,max_m
39
  |Tasks|Version| Filter |n-shot| Metric | |Value| |Stderr|
40
  |-----|------:|----------------|-----:|-----------|---|----:|---|-----:|
41
  |gsm8k| 3|flexible-extract| 5|exact_match|↑ |0.918|± |0.0123|
42
- | | |strict-match | 5|exact_match|↑ |0.916|± |0.0124|
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
39
  |Tasks|Version| Filter |n-shot| Metric | |Value| |Stderr|
40
  |-----|------:|----------------|-----:|-----------|---|----:|---|-----:|
41
  |gsm8k| 3|flexible-extract| 5|exact_match|↑ |0.918|± |0.0123|
42
+ | | |strict-match | 5|exact_match|↑ |0.916|± |0.0124|
43
+
44
+
45
+
46
+
47
+
48
+
49
+
50
+
51
+ vllm (pretrained=/root/autodl-tmp/phi-4,add_bos_token=true,max_model_len=2048), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto
52
+ |Tasks|Version| Filter |n-shot| Metric | |Value| |Stderr|
53
+ |-----|------:|----------------|-----:|-----------|---|----:|---|-----:|
54
+ |gsm8k| 3|flexible-extract| 5|exact_match|↑ |0.928|± |0.0164|
55
+ | | |strict-match | 5|exact_match|↑ |0.928|± |0.0164|
56
+
57
+ vllm (pretrained=/root/autodl-tmp/phi-4,add_bos_token=true,max_model_len=2048,dtype=bfloat16), gen_kwargs: (None), limit: 500.0, num_fewshot: 5, batch_size: auto
58
+ |Tasks|Version| Filter |n-shot| Metric | |Value| |Stderr|
59
+ |-----|------:|----------------|-----:|-----------|---|----:|---|-----:|
60
+ |gsm8k| 3|flexible-extract| 5|exact_match|↑ | 0.93|± |0.0114|
61
+ | | |strict-match | 5|exact_match|↑ | 0.93|± |0.0114|
62
+
63
+ vllm (pretrained=/root/autodl-tmp/phi-4,add_bos_token=true,max_model_len=800,dtype=bfloat16), gen_kwargs: (None), limit: 15.0, num_fewshot: None, batch_size: auto
64
+ | Groups |Version|Filter|n-shot|Metric| |Value | |Stderr|
65
+ |------------------|------:|------|------|------|---|-----:|---|-----:|
66
+ |mmlu | 2|none | |acc |↑ |0.7649|± |0.0137|
67
+ | - humanities | 2|none | |acc |↑ |0.8103|± |0.0256|
68
+ | - other | 2|none | |acc |↑ |0.7487|± |0.0287|
69
+ | - social sciences| 2|none | |acc |↑ |0.8167|± |0.0280|
70
+ | - stem | 2|none | |acc |↑ |0.7123|± |0.0260|
71
+
72
+
73
+ vllm (pretrained=/root/autodl-tmp/huihui-ai-phi-4-abliterated-W8A8,add_bos_token=true,max_model_len=2048), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto
74
+ |Tasks|Version| Filter |n-shot| Metric | |Value| |Stderr|
75
+ |-----|------:|----------------|-----:|-----------|---|----:|---|-----:|
76
+ |gsm8k| 3|flexible-extract| 5|exact_match|↑ |0.944|± |0.0146|
77
+ | | |strict-match | 5|exact_match|↑ |0.944|± |0.0146|
78
+
79
+ vllm (pretrained=/root/autodl-tmp/huihui-ai-phi-4-abliterated-W8A8,add_bos_token=true,max_model_len=2048,dtype=bfloat16), gen_kwargs: (None), limit: 500.0, num_fewshot: 5, batch_size: auto
80
+ |Tasks|Version| Filter |n-shot| Metric | |Value| |Stderr|
81
+ |-----|------:|----------------|-----:|-----------|---|----:|---|-----:|
82
+ |gsm8k| 3|flexible-extract| 5|exact_match|↑ |0.924|± |0.0119|
83
+ | | |strict-match | 5|exact_match|↑ |0.924|± |0.0119|
84
+
85
+ vllm (pretrained=/root/autodl-tmp/huihui-ai-phi-4-abliterated-W8A8,add_bos_token=true,max_model_len=800,dtype=bfloat16), gen_kwargs: (None), limit: 15.0, num_fewshot: None, batch_size: auto
86
+ | Groups |Version|Filter|n-shot|Metric| |Value | |Stderr|
87
+ |------------------|------:|------|------|------|---|-----:|---|-----:|
88
+ |mmlu | 2|none | |acc |↑ |0.7673|± |0.0136|
89
+ | - humanities | 2|none | |acc |↑ |0.8154|± |0.0256|
90
+ | - other | 2|none | |acc |↑ |0.7744|± |0.0285|
91
+ | - social sciences| 2|none | |acc |↑ |0.8278|± |0.0273|
92
+ | - stem | 2|none | |acc |↑ |0.6912|± |0.0263|