sail
/

Text Generation
Transformers
English
llama
SivilTaram commited on
Commit
a356e6c
·
verified ·
1 Parent(s): 1606665

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +207 -33
README.md CHANGED
@@ -1,38 +1,212 @@
1
  ---
2
  license: mit
 
 
 
 
3
  ---
4
 
5
 
6
- | **Task / Model** | **model-index-1** | **model-index-2** | **model-index-3** | **model-index-4** | **model-index-5** | **model-index-6** | **model-index-7** | **model-index-8** |
7
- |--------------------------|----------------|----------------|----------------|----------------|----------------|----------------|----------------|----------------|
8
- | **Social IQA** | 33.27 | 33.33 | 33.62 | 33.53 | 33.49 | 33.56 | 33.62 | 33.55 |
9
- | **HellaSwag** | 40.58 | 36.86 | 40.58 | 36.06 | 40.07 | 37.85 | 37.93 | 39.59 |
10
- | **PiQA** | 67.29 | 65.14 | 67.97 | 64.66 | 67.03 | 65.36 | 66.0 | 66.55 |
11
- | **OpenBookQA** | 28.63 | 27.87 | 29.33 | 29.1 | 29.23 | 28.33 | 29.13 | 28.73 |
12
- | **Lambada** | 29.17 | 26.86 | 31.55 | 27.11 | 29.16 | 28.92 | 31.53 | 30.92 |
13
- | **SciQ** | 80.68 | 79.98 | 81.05 | 80.8 | 82.4 | 79.88 | 78.67 | 79.7 |
14
- | **COPA** | 70.5 | 63.83 | 69.17 | 65.0 | 67.5 | 66.0 | 66.67 | 68.67 |
15
- | **RACE** | 29.47 | 30.0 | 32.11 | 28.82 | 31.13 | 30.06 | 29.9 | 30.75 |
16
- | **ARC Easy** | 50.03 | 48.72 | 50.01 | 46.64 | 51.06 | 47.46 | 46.75 | 48.39 |
17
- | **LogiQA** | 23.76 | 24.17 | 25.29 | 25.29 | 24.55 | 25.96 | 25.45 | 26.32 |
18
- | **QQP** | 55.71 | 55.9 | 54.84 | 56.52 | 54.01 | 56.34 | 52.35 | 54.2 |
19
- | **WinoGrande** | 51.54 | 51.59 | 51.39 | 50.91 | 53.13 | 52.26 | 51.26 | 51.45 |
20
- | **MultiRC** | 52.65 | 53.39 | 51.89 | 50.92 | 49.03 | 53.09 | 53.64 | 50.23 |
21
- | **Avg** | 47.18 | 45.97 | 47.60 | 45.80 | 47.06 | 46.54 | 46.38 | 46.85 |
22
-
23
- | **Task / Model** | **model-index-9** | **model-index-10** | **model-index-11** | **model-index-12** | **model-index-13** | **model-index-14** | **model-index-15** | **model-index-16** |
24
- |--------------------------|----------------|-----------------|-----------------|-----------------|-----------------|-----------------|-----------------|-----------------|
25
- | **Social IQA** | 33.43 | 33.21 | 33.31 | 33.17 | 33.28 | 32.43 | 33.57 | 33.7 |
26
- | **HellaSwag** | 40.05 | 35.89 | 39.55 | 39.89 | 38.63 | 36.18 | 39.52 | 35.94 |
27
- | **PiQA** | 66.6 | 64.74 | 66.29 | 66.27 | 66.9 | 64.05 | 66.7 | 64.51 |
28
- | **OpenBookQA** | 28.87 | 26.6 | 29.33 | 28.73 | 29.4 | 27.87 | 29.67 | 27.83 |
29
- | **Lambada** | 31.39 | 27.37 | 30.32 | 30.31 | 31.38 | 26.25 | 29.86 | 26.95 |
30
- | **SciQ** | 81.1 | 79.12 | 79.97 | 82.85 | 79.42 | 81.4 | 81.38 | 81.23 |
31
- | **COPA** | 67.0 | 64.5 | 66.83 | 69.5 | 67.33 | 65.83 | 69.5 | 66.33 |
32
- | **RACE** | 30.57 | 29.63 | 30.49 | 30.85 | 30.35 | 28.66 | 31.21 | 29.57 |
33
- | **ARC Easy** | 50.66 | 47.74 | 47.47 | 50.18 | 49.92 | 49.52 | 50.73 | 48.65 |
34
- | **LogiQA** | 23.6 | 25.65 | 26.37 | 23.81 | 25.58 | 26.29 | 25.86 | 25.12 |
35
- | **QQP** | 54.89 | 54.79 | 54.2 | 55.23 | 53.69 | 57.09 | 53.95 | 54.24 |
36
- | **WinoGrande** | 50.83 | 51.84 | 51.05 | 51.83 | 52.12 | 52.0 | 51.01 | 51.82 |
37
- | **MultiRC** | 54.18 | 54.48 | 50.17 | 52.12 | 51.42 | 52.69 | 51.87 | 53.48 |
38
- | **Avg** | 47.17 | 45.81 | 46.57 | 47.29 | 46.88 | 46.17 | 47.30 | 46.11 |
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: mit
3
+ datasets:
4
+ - sail/regmix-data
5
+ language:
6
+ - en
7
  ---
8
 
9
 
10
+ # Models Trained with Random Mixture
11
+
12
+ ## How to Load a Model
13
+
14
+ You can load any model using the corresponding branch with the Hugging Face Transformers library:
15
+
16
+ ```python
17
+ from transformers import AutoModel, AutoTokenizer
18
+
19
+ model = AutoModel.from_pretrained("sail/data-mixture-random-1b", revision="model-index-1")
20
+ tokenizer = AutoTokenizer.from_pretrained("sail/data-mixture-random-1b", revision="model-index-1")
21
+ ```
22
+
23
+
24
+ ## Data Mixture
25
+
26
+ The specific data mixture used for training each 1B model can be found in the file `train_config.yaml` in each corresponding model branch.
27
+
28
+ ## Model Variants
29
+
30
+ To access different model variants, simply change the `revision` parameter in the `from_pretrained` method to the desired model index (e.g., "model-index-2", "model-index-3"), and the maxium index is 64.
31
+
32
+ ## Usage Notes
33
+
34
+ - These models are primarily intended for research purposes.
35
+ - Performance may vary depending on the specific task and domain.
36
+
37
+ ## Citation
38
+
39
+ If you use these models in your research, please cite the RegMix paper:
40
+
41
+ ```
42
+ @misc{liu2024regmix,
43
+ title={RegMix: Data Mixture as Regression for Language Model Pre-training},
44
+ author={Qian Liu and Xiaosen Zheng and Niklas Muennighoff and Guangtao Zeng and Longxu Dou and Tianyu Pang and Jing Jiang and Min Lin},
45
+ year={2024},
46
+ eprint={2407.01492},
47
+ archivePrefix={arXiv},
48
+ primaryClass={cs.CL},
49
+ url={https://arxiv.org/abs/2407.01492},
50
+ }
51
+ ```
52
+
53
+ For more information about the RegMix methodology and its applications, please refer to the [original paper](https://huggingface.co/papers/2407.01492).
54
+
55
+ ## Performance
56
+
57
+ We evaluated each model using [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness). The performance metric for each task is the average of 0-shot to 5-shot `accnorm` (accuracy normalized, if available) or `acc` (accuracy) scores.
58
+
59
+ ### Table 1: Model Index 1-8
60
+
61
+ | Task | Model 1 | Model 2 | Model 3 | Model 4 | Model 5 | Model 6 | Model 7 | Model 8 |
62
+ |---------------|---------|---------|---------|---------|---------|---------|---------|---------|
63
+ | Social IQA | 33.27 | 33.33 | 33.62 | 33.53 | 33.49 | 33.56 | 33.62 | 33.55 |
64
+ | HellaSwag | 40.58 | 36.86 | 40.58 | 36.06 | 40.07 | 37.85 | 37.93 | 39.59 |
65
+ | PiQA | 67.29 | 65.14 | 67.97 | 64.66 | 67.03 | 65.36 | 66.00 | 66.55 |
66
+ | OpenBookQA | 28.63 | 27.87 | 29.33 | 29.10 | 29.23 | 28.33 | 29.13 | 28.73 |
67
+ | Lambada | 29.17 | 26.86 | 31.55 | 27.11 | 29.16 | 28.92 | 31.53 | 30.92 |
68
+ | SciQ | 80.68 | 79.98 | 81.05 | 80.80 | 82.40 | 79.88 | 78.67 | 79.70 |
69
+ | COPA | 70.50 | 63.83 | 69.17 | 65.00 | 67.50 | 66.00 | 66.67 | 68.67 |
70
+ | RACE | 29.47 | 30.00 | 32.11 | 28.82 | 31.13 | 30.06 | 29.90 | 30.75 |
71
+ | ARC Easy | 50.03 | 48.72 | 50.01 | 46.64 | 51.06 | 47.46 | 46.75 | 48.39 |
72
+ | LogiQA | 23.76 | 24.17 | 25.29 | 25.29 | 24.55 | 25.96 | 25.45 | 26.32 |
73
+ | QQP | 55.71 | 55.90 | 54.84 | 56.52 | 54.01 | 56.34 | 52.35 | 54.20 |
74
+ | WinoGrande | 51.54 | 51.59 | 51.39 | 50.91 | 53.13 | 52.26 | 51.26 | 51.45 |
75
+ | MultiRC | 52.65 | 53.39 | 51.89 | 50.92 | 49.03 | 53.09 | 53.64 | 50.23 |
76
+ | **Average** | **47.18** | **45.97** | **47.60** | **45.80** | **47.06** | **46.54** | **46.38** | **46.85** |
77
+
78
+ ### Table 2: Model Index 9-16
79
+
80
+ | Task | Model 9 | Model 10 | Model 11 | Model 12 | Model 13 | Model 14 | Model 15 | Model 16 |
81
+ |---------------|---------|----------|----------|----------|----------|----------|----------|----------|
82
+ | Social IQA | 33.43 | 33.21 | 33.31 | 33.17 | 33.28 | 32.43 | 33.57 | 33.70 |
83
+ | HellaSwag | 40.05 | 35.89 | 39.55 | 39.89 | 38.63 | 36.18 | 39.52 | 35.94 |
84
+ | PiQA | 66.60 | 64.74 | 66.29 | 66.27 | 66.90 | 64.05 | 66.70 | 64.51 |
85
+ | OpenBookQA | 28.87 | 26.60 | 29.33 | 28.73 | 29.40 | 27.87 | 29.67 | 27.83 |
86
+ | Lambada | 31.39 | 27.37 | 30.32 | 30.31 | 31.38 | 26.25 | 29.86 | 26.95 |
87
+ | SciQ | 81.10 | 79.12 | 79.97 | 82.85 | 79.42 | 81.40 | 81.38 | 81.23 |
88
+ | COPA | 67.00 | 64.50 | 66.83 | 69.50 | 67.33 | 65.83 | 69.50 | 66.33 |
89
+ | RACE | 30.57 | 29.63 | 30.49 | 30.85 | 30.35 | 28.66 | 31.21 | 29.57 |
90
+ | ARC Easy | 50.66 | 47.74 | 47.47 | 50.18 | 49.92 | 49.52 | 50.73 | 48.65 |
91
+ | LogiQA | 23.60 | 25.65 | 26.37 | 23.81 | 25.58 | 26.29 | 25.86 | 25.12 |
92
+ | QQP | 54.89 | 54.79 | 54.20 | 55.23 | 53.69 | 57.09 | 53.95 | 54.24 |
93
+ | WinoGrande | 50.83 | 51.84 | 51.05 | 51.83 | 52.12 | 52.00 | 51.01 | 51.82 |
94
+ | MultiRC | 54.18 | 54.48 | 50.17 | 52.12 | 51.42 | 52.69 | 51.87 | 53.48 |
95
+ | **Average** | **47.17** | **45.81** | **46.57** | **47.29** | **46.88** | **46.17** | **47.30** | **46.11** |
96
+
97
+ ### Table 3: Model Index 17-24
98
+
99
+ | Task | Model 17 | Model 18 | Model 19 | Model 20 | Model 21 | Model 22 | Model 23 | Model 24 |
100
+ |---------------|----------|----------|----------|----------|----------|----------|----------|----------|
101
+ | Social IQA | 33.89 | 33.31 | 33.53 | 33.38 | 33.75 | 33.24 | 33.56 | 33.71 |
102
+ | HellaSwag | 38.68 | 39.90 | 34.67 | 37.12 | 37.44 | 36.07 | 42.15 | 34.67 |
103
+ | PiQA | 66.83 | 67.39 | 63.33 | 64.83 | 65.00 | 63.68 | 67.80 | 62.99 |
104
+ | OpenBookQA | 28.13 | 30.67 | 28.03 | 29.40 | 27.67 | 27.77 | 29.37 | 25.83 |
105
+ | Lambada | 28.78 | 28.56 | 24.13 | 29.41 | 27.67 | 28.03 | 33.47 | 24.04 |
106
+ | SciQ | 79.60 | 78.83 | 77.42 | 78.98 | 78.95 | 78.72 | 81.83 | 79.12 |
107
+ | COPA | 65.17 | 68.17 | 65.33 | 67.33 | 67.67 | 62.67 | 69.83 | 65.83 |
108
+ | RACE | 28.74 | 30.03 | 29.76 | 29.49 | 30.77 | 29.76 | 31.21 | 27.91 |
109
+ | ARC Easy | 48.86 | 49.42 | 47.90 | 48.30 | 47.88 | 46.68 | 50.92 | 45.24 |
110
+ | LogiQA | 25.91 | 26.34 | 26.24 | 25.76 | 26.11 | 26.24 | 24.17 | 25.91 |
111
+ | QQP | 53.35 | 53.18 | 50.61 | 51.49 | 54.27 | 54.99 | 52.77 | 55.19 |
112
+ | WinoGrande | 52.54 | 51.17 | 52.01 | 51.09 | 52.13 | 52.03 | 52.50 | 50.28 |
113
+ | MultiRC | 51.49 | 52.45 | 55.40 | 54.87 | 51.73 | 49.49 | 50.61 | 50.29 |
114
+ | **Average** | **46.30** | **46.88** | **45.26** | **46.27** | **46.23** | **45.34** | **47.71** | **44.69** |
115
+
116
+ ### Table 4: Model Index 25-32
117
+
118
+ | Task | Model 25 | Model 26 | Model 27 | Model 28 | Model 29 | Model 30 | Model 31 | Model 32 |
119
+ |---------------|----------|----------|----------|----------|----------|----------|----------|----------|
120
+ | Social IQA | 33.51 | 33.40 | 33.59 | 33.52 | 33.53 | 33.49 | 33.16 | 33.56 |
121
+ | HellaSwag | 36.75 | 36.97 | 40.81 | 38.25 | 40.28 | 35.71 | 37.37 | 37.39 |
122
+ | PiQA | 64.09 | 64.74 | 67.97 | 66.15 | 66.88 | 63.84 | 64.47 | 65.05 |
123
+ | OpenBookQA | 29.47 | 28.70 | 29.57 | 29.77 | 29.50 | 29.13 | 29.47 | 28.00 |
124
+ | Lambada | 26.69 | 33.00 | 31.60 | 33.08 | 31.49 | 27.69 | 26.99 | 29.54 |
125
+ | SciQ | 80.03 | 79.17 | 80.12 | 80.22 | 81.92 | 78.23 | 77.42 | 80.87 |
126
+ | COPA | 67.67 | 65.50 | 69.00 | 65.67 | 68.33 | 63.33 | 64.67 | 67.17 |
127
+ | RACE | 30.05 | 30.19 | 30.96 | 30.37 | 30.08 | 29.62 | 30.13 | 29.92 |
128
+ | ARC Easy | 47.50 | 46.90 | 50.26 | 48.57 | 50.55 | 46.96 | 48.77 | 48.79 |
129
+ | LogiQA | 27.24 | 25.55 | 25.86 | 24.37 | 25.32 | 25.12 | 26.40 | 24.30 |
130
+ | QQP | 49.68 | 55.43 | 50.94 | 50.91 | 51.99 | 53.53 | 49.53 | 51.36 |
131
+ | WinoGrande | 51.68 | 52.12 | 51.93 | 51.50 | 52.32 | 51.67 | 52.13 | 52.63 |
132
+ | MultiRC | 51.24 | 51.91 | 50.33 | 52.42 | 52.52 | 54.04 | 52.05 | 53.04 |
133
+ | **Average** | **45.82** | **46.43** | **47.15** | **46.52** | **47.29** | **45.57** | **45.58** | **46.28** |
134
+
135
+ ### Table 5: Model Index 33-40
136
+
137
+ | Task | Model 33 | Model 34 | Model 35 | Model 36 | Model 37 | Model 38 | Model 39 | Model 40 |
138
+ |---------------|----------|----------|----------|----------|----------|----------|----------|----------|
139
+ | Social IQA | 33.48 | 33.28 | 33.35 | 33.29 | 33.63 | 33.61 | 33.21 | 33.61 |
140
+ | HellaSwag | 38.00 | 40.18 | 43.37 | 37.69 | 32.96 | 32.98 | 37.31 | 37.79 |
141
+ | PiQA | 65.30 | 66.68 | 69.04 | 66.46 | 62.25 | 60.17 | 65.24 | 65.32 |
142
+ | OpenBookQA | 29.43 | 30.37 | 30.43 | 27.63 | 26.43 | 26.83 | 27.97 | 28.70 |
143
+ | Lambada | 26.59 | 31.46 | 31.71 | 30.21 | 18.92 | 20.29 | 28.10 | 28.58 |
144
+ | SciQ | 79.82 | 80.58 | 82.13 | 80.83 | 76.73 | 77.90 | 79.12 | 79.60 |
145
+ | COPA | 64.33 | 69.33 | 67.00 | 67.83 | 61.50 | 62.67 | 64.67 | 66.00 |
146
+ | RACE | 30.03 | 30.16 | 32.47 | 30.49 | 29.27 | 28.12 | 30.11 | 30.21 |
147
+ | ARC Easy | 48.86 | 49.88 | 52.22 | 48.32 | 44.86 | 45.54 | 48.15 | 48.86 |
148
+ | LogiQA | 25.91 | 24.30 | 23.35 | 24.96 | 26.19 | 27.68 | 25.47 | 25.37 |
149
+ | QQP | 56.06 | 56.56 | 52.57 | 56.70 | 52.54 | 48.04 | 49.81 | 57.12 |
150
+ | WinoGrande | 50.92 | 50.97 | 52.39 | 52.70 | 52.30 | 51.68 | 51.42 | 52.80 |
151
+ | MultiRC | 53.09 | 49.97 | 52.18 | 49.05 | 53.78 | 52.27 | 51.45 | 55.68 |
152
+ | **Average** | **46.29** | **47.21** | **47.86** | **46.63** | **43.95** | **43.67** | **45.54** | **46.90** |
153
+
154
+
155
+ ### Table 6: Model Index 41-48
156
+
157
+ | Task | Model 41 | Model 42 | Model 43 | Model 44 | Model 45 | Model 46 | Model 47 | Model 48 |
158
+ |---------------|----------|----------|----------|----------|----------|----------|----------|----------|
159
+ | Social IQA | 33.49 | 33.43 | 33.07 | 33.28 | 33.44 | 33.08 | 33.78 | 33.17 |
160
+ | HellaSwag | 34.51 | 37.59 | 42.69 | 37.37 | 38.31 | 38.30 | 39.67 | 41.07 |
161
+ | PiQA | 62.24 | 65.58 | 68.05 | 66.62 | 66.54 | 65.52 | 66.98 | 67.21 |
162
+ | OpenBookQA | 27.10 | 28.77 | 28.90 | 28.07 | 28.07 | 27.60 | 31.17 | 29.73 |
163
+ | Lambada | 22.78 | 26.99 | 31.34 | 29.51 | 27.87 | 29.47 | 30.34 | 32.71 |
164
+ | SciQ | 77.78 | 80.25 | 79.47 | 80.25 | 80.70 | 79.72 | 81.35 | 81.77 |
165
+ | COPA | 64.00 | 66.33 | 67.00 | 67.00 | 67.33 | 68.33 | 67.17 | 67.67 |
166
+ | RACE | 28.33 | 28.82 | 30.78 | 30.80 | 30.08 | 30.24 | 30.24 | 30.67 |
167
+ | ARC Easy | 45.48 | 48.64 | 51.49 | 46.99 | 48.79 | 48.05 | 49.58 | 49.49 |
168
+ | LogiQA | 24.83 | 24.96 | 24.76 | 23.25 | 26.06 | 25.55 | 24.32 | 24.68 |
169
+ | QQP | 50.27 | 54.73 | 53.96 | 57.00 | 53.73 | 51.19 | 57.52 | 56.91 |
170
+ | WinoGrande | 51.79 | 51.63 | 51.32 | 50.76 | 53.18 | 52.45 | 50.72 | 52.24 |
171
+ | MultiRC | 54.03 | 53.96 | 48.91 | 50.74 | 53.01 | 50.89 | 47.63 | 53.84 |
172
+ | **Average** | **44.35** | **46.28** | **47.06** | **46.28** | **46.70** | **46.18** | **46.96** | **47.78** |
173
+
174
+
175
+ ## Table 7: Model Index 49-56
176
+
177
+ | Task | Model 49 | Model 50 | Model 51 | Model 52 | Model 53 | Model 54 | Model 55 | Model 56 |
178
+ |---------------|----------|----------|----------|----------|----------|----------|----------|----------|
179
+ | Social IQA | 33.53 | 33.74 | 33.37 | 33.41 | 32.96 | 33.88 | 33.75 | 33.79 |
180
+ | HellaSwag | 39.09 | 35.65 | 38.68 | 36.07 | 37.68 | 38.53 | 35.40 | 40.50 |
181
+ | PiQA | 66.81 | 64.58 | 65.68 | 63.99 | 65.85 | 65.76 | 64.51 | 66.89 |
182
+ | OpenBookQA | 29.13 | 27.57 | 28.27 | 29.10 | 29.43 | 28.73 | 28.30 | 29.87 |
183
+ | Lambada | 30.23 | 26.19 | 30.29 | 30.84 | 29.76 | 29.03 | 28.63 | 30.74 |
184
+ | SciQ | 79.90 | 80.83 | 78.40 | 80.03 | 81.38 | 80.92 | 77.75 | 82.07 |
185
+ | COPA | 68.17 | 61.83 | 67.00 | 66.00 | 66.17 | 63.17 | 66.33 | 64.00 |
186
+ | RACE | 31.42 | 29.35 | 30.41 | 31.08 | 30.77 | 29.73 | 30.80 | 31.42 |
187
+ | ARC Easy | 49.54 | 47.71 | 49.02 | 47.64 | 48.38 | 49.36 | 46.96 | 51.22 |
188
+ | LogiQA | 24.99 | 24.58 | 25.32 | 24.91 | 25.17 | 26.22 | 24.63 | 24.91 |
189
+ | QQP | 54.06 | 56.48 | 50.96 | 56.62 | 56.45 | 53.86 | 53.85 | 53.26 |
190
+ | WinoGrande | 50.51 | 50.26 | 51.83 | 51.33 | 52.18 | 51.89 | 51.59 | 50.50 |
191
+ | MultiRC | 50.25 | 54.37 | 50.94 | 52.38 | 51.21 | 55.34 | 54.52 | 50.50 |
192
+ | **Average** | **46.74** | **45.63** | **46.17** | **46.42** | **46.72** | **46.65** | **45.92** | **46.90** |
193
+
194
+
195
+ ## Table 8: Model Index 57-64
196
+
197
+ | Task | Model 57 | Model 58 | Model 59 | Model 60 | Model 61 | Model 62 | Model 63 | Model 64 |
198
+ |---------------|----------|----------|----------|----------|----------|----------|----------|----------|
199
+ | Social IQA | 33.24 | 33.30 | 33.56 | 33.54 | 33.42 | 33.84 | 33.32 | 33.55 |
200
+ | HellaSwag | 41.74 | 39.63 | 35.36 | 38.83 | 38.53 | 36.46 | 38.80 | 36.43 |
201
+ | PiQA | 68.07 | 67.31 | 64.44 | 66.38 | 66.50 | 64.74 | 66.54 | 64.87 |
202
+ | OpenBookQA | 29.20 | 29.50 | 28.10 | 27.97 | 27.83 | 27.37 | 28.83 | 27.87 |
203
+ | Lambada | 31.79 | 31.11 | 27.32 | 30.17 | 28.75 | 26.22 | 30.38 | 26.25 |
204
+ | SciQ | 80.42 | 79.83 | 80.85 | 79.60 | 78.93 | 80.05 | 79.50 | 78.65 |
205
+ | COPA | 66.17 | 69.00 | 64.00 | 64.83 | 67.00 | 64.00 | 66.00 | 66.83 |
206
+ | RACE | 31.39 | 29.82 | 29.67 | 30.08 | 29.98 | 29.46 | 30.37 | 29.19 |
207
+ | ARC Easy | 51.14 | 49.24 | 47.13 | 47.88 | 48.20 | 47.09 | 49.09 | 46.90 |
208
+ | LogiQA | 25.19 | 25.93 | 23.68 | 25.17 | 25.70 | 25.52 | 26.50 | 26.65 |
209
+ | QQP | 55.37 | 54.46 | 52.73 | 53.17 | 59.65 | 58.15 | 57.50 | 55.31 |
210
+ | WinoGrande | 53.21 | 51.46 | 50.83 | 52.16 | 52.37 | 51.41 | 51.63 | 51.85 |
211
+ | MultiRC | 53.58 | 52.31 | 52.22 | 53.03 | 50.41 | 52.17 | 52.27 | 51.50 |
212
+ | **Average** | **47.73** | **47.15** | **45.38** | **46.37** | **46.71** | **45.88** | **46.98** | **45.84** |