BrownianNotion commited on
Commit
96ab1ef
·
verified ·
1 Parent(s): 8531f36

Upload metrics for baseline

Browse files
Files changed (1) hide show
  1. README.md +94 -0
README.md ADDED
@@ -0,0 +1,94 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ datasets:
3
+ - mindchain/wikitext2
4
+ - yahma/alpaca-cleaned
5
+ metrics:
6
+ - perplexity
7
+ - accuracy
8
+ base_model:
9
+ - TinyLlama/TinyLlama_v1.1
10
+
11
+
12
+ model-index:
13
+ - name: TinyLlama_v1.1_mix_wikitext_alpaca_2bit_BitDistiller_baseline
14
+ results:
15
+ - task:
16
+ type: multiple-choice
17
+ name: QA Benchmarking
18
+ dataset:
19
+ type: allenai/arc
20
+ name: ARC-Challenge
21
+ config: challenge
22
+ split: test
23
+ metrics:
24
+ - type: accuracy
25
+ name: Accuracy
26
+ value: 0.2150170648464164
27
+ - type: accuracy
28
+ name: Normalized Accuracy
29
+ value: 0.24573378839590443
30
+ - task:
31
+ type: multiple-choice
32
+ name: QA Benchmarking
33
+ dataset:
34
+ type: hellaswag
35
+ name: HellaSwag
36
+ split: test
37
+ metrics:
38
+ - type: accuracy
39
+ name: Accuracy
40
+ value: 0.3240390360485959
41
+ - type: accuracy
42
+ name: Normalized Accuracy
43
+ value: 0.37333200557657836
44
+ - task:
45
+ type: multiple-choice
46
+ name: QA Benchmarking
47
+ dataset:
48
+ type: piqa
49
+ name: PIQA
50
+ split: validation
51
+ metrics:
52
+ - type: accuracy
53
+ name: Accuracy
54
+ value: 0.6082698585418934
55
+ - type: accuracy
56
+ name: Normalized Accuracy
57
+ value: 0.6071817192600653
58
+ - task:
59
+ type: multiple-choice
60
+ name: QA Benchmarking
61
+ dataset:
62
+ type: winogrande
63
+ name: Winogrande
64
+ split: test
65
+ metrics:
66
+ - type: accuracy
67
+ name: Accuracy
68
+ value: 0.5201262825572218
69
+ - task:
70
+ type: multiple-choice
71
+ name: QA Benchmarking
72
+ dataset:
73
+ type: aggregated
74
+ name: QA-Avg
75
+ metrics:
76
+ - type: accuracy
77
+ name: QA Average
78
+ value: 0.4168630604985319
79
+ - task:
80
+ type: language-modeling
81
+ name: Language Modeling
82
+ dataset:
83
+ type: wikitext
84
+ name: WikiText-2
85
+ split: test
86
+ metrics:
87
+ - type: perplexity
88
+ name: Perplexity
89
+ value: 22.655162811279297
90
+
91
+
92
+ ---
93
+
94
+ TODO: check the splits of each dataset