rAIfle commited on
Commit
ae7280e
·
verified ·
1 Parent(s): 858e454

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +226 -0
README.md ADDED
@@ -0,0 +1,226 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: cc-by-nc-4.0
3
+ base_model_relation: quantized
4
+ quantized_by: Quant-Cartel
5
+ base_model: allenai/Llama-3.1-Tulu-3-70B
6
+
7
+ ---
8
+ ```
9
+ e88 88e d8
10
+ d888 888b 8888 8888 ,"Y88b 888 8e d88
11
+ C8888 8888D 8888 8888 "8" 888 888 88b d88888
12
+ Y888 888P Y888 888P ,ee 888 888 888 888
13
+ "88 88" "88 88" "88 888 888 888 888
14
+ b
15
+ 8b,
16
+
17
+ e88'Y88 d8 888
18
+ d888 'Y ,"Y88b 888,8, d88 ,e e, 888
19
+ C8888 "8" 888 888 " d88888 d88 88b 888
20
+ Y888 ,d ,ee 888 888 888 888 , 888
21
+ "88,d88 "88 888 888 888 "YeeP" 888
22
+
23
+ PROUDLY PRESENTS
24
+ ```
25
+ # rAIfle/Llama-3.1-Tulu-3-70B-exl2-longcal
26
+
27
+ Quantized using 115 rows of 8192 tokens from the default ExLlamav2-calibration dataset.
28
+
29
+ Branches:
30
+ - `main` -- `measurement.json`
31
+ - 8.0b8h -- 8.0bpw, 8bit lm_head
32
+ - 6.0b6h -- 6.0bpw, 6bit lm_head
33
+ - 5.0b6h -- 5.0bpw, 6bit lm_head
34
+ - 4.5b6h -- 4.5bpw, 6bit lm_head
35
+ - 4.0b6h -- 4.0bpw, 6bit lm_head
36
+ - 3.0b6h -- 3.0bpw, 6bit lm_head
37
+ - 2.25b6h -- 2.25bpw, 6bit lm_head
38
+
39
+ Original model link: [allenai/Llama-3.1-Tulu-3-70B](https://huggingface.co/allenai/Llama-3.1-Tulu-3-70B)
40
+
41
+ Original model README below.
42
+
43
+ -----
44
+ <img src="https://huggingface.co/datasets/allenai/blog-images/resolve/main/tulu3/Tulu3-logo.png" alt="Tulu 3 banner" width="800" style="margin-left:'auto' margin-right:'auto' display:'block'"/>
45
+
46
+ # Llama-3.1-Tulu-3-70B
47
+
48
+ Tülu3 is a leading instruction following model family, offering fully open-source data, code, and recipes designed to serve as a comprehensive guide for modern post-training techniques.
49
+ Tülu3 is designed for state-of-the-art performance on a diversity of tasks in addition to chat, such as MATH, GSM8K, and IFEval.
50
+
51
+ ## Model description
52
+
53
+ - **Model type:** A model trained on a mix of publicly available, synthetic and human-created datasets.
54
+ - **Language(s) (NLP):** Primarily English
55
+ - **License:** Llama 3.1 Community License Agreement
56
+ - **Finetuned from model:** allenai/Llama-3.1-Tulu-3-70B-DPO
57
+
58
+ ### Model Sources
59
+
60
+ - **Training Repository:** https://github.com/allenai/open-instruct
61
+ - **Eval Repository:** https://github.com/allenai/olmes
62
+ - **Paper:** https://arxiv.org/abs/2411.15124
63
+ - **Demo:** https://playground.allenai.org/
64
+
65
+ ### Model Family
66
+
67
+ | **Stage** | **Llama 3.1 8B** | **Llama 3.1 70B** |
68
+ |----------------------|----------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------|
69
+ | **Base Model** | [meta-llama/Llama-3.1-8B](https://huggingface.co/meta-llama/Llama-3.1-8B) | [meta-llama/Llama-3.1-70B](https://huggingface.co/meta-llama/Llama-3.1-70B) |
70
+ | **SFT** | [allenai/Llama-3.1-Tulu-3-8B-SFT](https://huggingface.co/allenai/Llama-3.1-Tulu-3-8B-SFT) | [allenai/Llama-3.1-Tulu-3-70B-SFT](https://huggingface.co/allenai/Llama-3.1-Tulu-3-70B-SFT) |
71
+ | **DPO** | [allenai/Llama-3.1-Tulu-3-8B-DPO](https://huggingface.co/allenai/Llama-3.1-Tulu-3-8B-DPO) | [allenai/Llama-3.1-Tulu-3-70B-DPO](https://huggingface.co/allenai/Llama-3.1-Tulu-3-70B-DPO) |
72
+ | **Final Models (RLVR)** | [allenai/Llama-3.1-Tulu-3-8B](https://huggingface.co/allenai/Llama-3.1-Tulu-3-8B) | [allenai/Llama-3.1-Tulu-3-70B](https://huggingface.co/allenai/Llama-3.1-Tulu-3-70B) |
73
+ | **Reward Model (RM)**| [allenai/Llama-3.1-Tulu-3-8B-RM](https://huggingface.co/allenai/Llama-3.1-Tulu-3-8B-RM) | (Same as 8B) |
74
+
75
+ ## Using the model
76
+
77
+ ### Loading with HuggingFace
78
+
79
+ To load the model with HuggingFace, use the following snippet:
80
+ ```
81
+ from transformers import AutoModelForCausalLM
82
+
83
+ tulu_model = AutoModelForCausalLM.from_pretrained("allenai/Llama-3.1-Tulu-3-70B")
84
+ ```
85
+
86
+ ### VLLM
87
+
88
+ As a Llama base model, the model can be easily served with:
89
+ ```
90
+ vllm serve allenai/Llama-3.1-Tulu-3-70B
91
+ ```
92
+ Note that given the long chat template of Llama, you may want to use `--max_model_len=8192`.
93
+
94
+ ### Chat template
95
+
96
+ The chat template for our models is formatted as:
97
+ ```
98
+ <|user|>\nHow are you doing?\n<|assistant|>\nI'm just a computer program, so I don't have feelings, but I'm functioning as expected. How can I assist you today?<|endoftext|>
99
+ ```
100
+ Or with new lines expanded:
101
+ ```
102
+ <|user|>
103
+ How are you doing?
104
+ <|assistant|>
105
+ I'm just a computer program, so I don't have feelings, but I'm functioning as expected. How can I assist you today?<|endoftext|>
106
+ ```
107
+ It is embedded within the tokenizer as well, for `tokenizer.apply_chat_template`.
108
+
109
+ ### System prompt
110
+
111
+ In Ai2 demos, we use this system prompt by default:
112
+ ```
113
+ You are Tulu 3, a helpful and harmless AI Assistant built by the Allen Institute for AI.
114
+ ```
115
+ The model has not been trained with a specific system prompt in mind.
116
+
117
+ ### Bias, Risks, and Limitations
118
+
119
+ The Tülu3 models have limited safety training, but are not deployed automatically with in-the-loop filtering of responses like ChatGPT, so the model can produce problematic outputs (especially when prompted to do so).
120
+ It is also unknown what the size and composition of the corpus was used to train the base Llama 3.1 models, however it is likely to have included a mix of Web data and technical sources like books and code.
121
+ See the Falcon 180B model card for an example of this.
122
+
123
+
124
+ ## Performance
125
+
126
+ | Benchmark (eval) | Tülu 3 SFT 8B | Tülu 3 DPO 8B | Tülu 3 8B | Llama 3.1 8B Instruct | Qwen 2.5 7B Instruct | Magpie 8B | Gemma 2 9B Instruct | Ministral 8B Instruct |
127
+ |---------------------------------|----------------|----------------|------------|------------------------|----------------------|-----------|---------------------|-----------------------|
128
+ | **Avg.** | 60.4 | 64.4 | **64.8** | 62.2 | 57.8 | 44.7 | 55.2 | 58.3 |
129
+ | **MMLU (0 shot, CoT)** | 65.9 | 68.7 | 68.2 | 71.2 | **76.6** | 62.0 | 74.6 | 68.5 |
130
+ | **PopQA (15 shot)** | **29.3** | 29.3 | 29.1 | 20.2 | 18.1 | 22.5 | 28.3 | 20.2 |
131
+ | **TruthfulQA (6 shot)** | 46.8 | 56.1 | 55.0 | 55.1 | **63.1** | 57.0 | 61.4 | 55.5 |
132
+ | **BigBenchHard (3 shot, CoT)** | **67.9** | 65.8 | 66.0 | 62.8 | 21.7 | 0.9 | 2.5 | 56.2 |
133
+ | **DROP (3 shot)** | 61.3 | 62.5 | **62.6** | 61.5 | 54.4 | 49.4 | 58.8 | 56.2 |
134
+ | **MATH (4 shot CoT, Flex)** | 31.5 | 42.0 | **43.7** | 42.5 | 14.8 | 5.1 | 29.8 | 40.0 |
135
+ | **GSM8K (8 shot, CoT)** | 76.2 | 84.3 | **87.6** | 83.4 | 83.8 | 61.2 | 79.7 | 80.0 |
136
+ | **HumanEval (pass@10)** | 86.2 | 83.9 | 83.9 | 86.3 | **93.1** | 75.4 | 71.7 | 91.0 |
137
+ | **HumanEval+ (pass@10)** | 81.4 | 78.6 | 79.2 | 82.9 | **89.7** | 69.1 | 67.0 | 88.5 |
138
+ | **IFEval (prompt loose)** | 72.8 | 81.1 | **82.4** | 80.6 | 74.7 | 38.8 | 69.9 | 56.4 |
139
+ | **AlpacaEval 2 (LC % win)** | 12.4 | 33.5 | 34.5 | 24.2 | 29.0 | **49.0** | 43.7 | 31.4 |
140
+ | **Safety (6 task avg.)** | **93.1** | 87.2 | 85.5 | 75.2 | 75.0 | 46.4 | 75.5 | 56.2 |
141
+
142
+ | Benchmark (eval) | Tülu 3 70B SFT | Tülu 3 DPO 70B | Tülu 3 70B | Llama 3.1 70B Instruct | Qwen 2.5 72B Instruct | Hermes 3 Llama 3.1 70B | Nemotron Llama 3.1 70B |
143
+ |---------------------------------|-----------------|-----------------|-------------|-------------------------|-----------------------|------------------------|-------------------------|
144
+ | **Avg.** | 72.6 | 75.9 | **76.0** | 73.4 | 71.5 | 68.3 | 65.5 |
145
+ | **MMLU (0 shot, CoT)** | 78.9 | 83.3 | 83.1 | 85.3 | **85.5** | 80.4 | 83.8 |
146
+ | **PopQA (15 shot)** | **48.6** | 46.3 | 46.5 | 46.4 | 30.6 | 48.1 | 36.4 |
147
+ | **TruthfulQA (6 shot)** | 55.7 | 67.9 | 67.6 | 66.8 | **69.9** | 66.5 | 62.6 |
148
+ | **BigBenchHard (3 shot, CoT)** | **82.7** | 81.8 | 82.0 | 73.8 | 67.2 | 82.1 | 0.7 |
149
+ | **DROP (3 shot)** | **77.2** | 74.1 | 74.3 | 77.0 | 34.2 | 73.2 | 68.8 |
150
+ | **MATH (4 shot CoT, Flex)** | 53.7 | 62.3 | 63.0 | 56.4 | **74.3** | 41.9 | 55.0 |
151
+ | **GSM8K (8 shot, CoT)** | 91.1 | 93.5 | 93.5 | **93.7** | 89.5 | 90.0 | 84.7 |
152
+ | **HumanEval (pass@10)** | 92.9 | 92.4 | 92.4 | 93.6 | 94.0 | 89.6 | **94.1** |
153
+ | **HumanEval+ (pass@10)** | 87.3 | 88.4 | 88.0 | 89.5 | **90.8** | 85.9 | 85.5 |
154
+ | **IFEval (prompt loose)** | 82.1 | 82.6 | 83.2 | **88.0** | 87.6 | 76.0 | 79.9 |
155
+ | **AlpacaEval 2 (LC % win)** | 26.3 | 49.6 | 49.8 | 33.4 | 47.7 | 28.4 | **66.1** |
156
+ | **Safety (6 task avg.)** | **94.4** | 89.0 | 88.3 | 76.5 | 87.0 | 57.9 | 69.0 |
157
+
158
+
159
+ ## Hyperparamters
160
+
161
+ PPO settings for RLVR:
162
+ - **Learning Rate**: 3 × 10⁻⁷
163
+ - **Discount Factor (gamma)**: 1.0
164
+ - **General Advantage Estimation (lambda)**: 0.95
165
+ - **Mini-batches (N_mb)**: 1
166
+ - **PPO Update Iterations (K)**: 4
167
+ - **PPO's Clipping Coefficient (epsilon)**: 0.2
168
+ - **Value Function Coefficient (c1)**: 0.1
169
+ - **Gradient Norm Threshold**: 1.0
170
+ - **Learning Rate Schedule**: Linear
171
+ - **Generation Temperature**: 1.0
172
+ - **Batch Size (effective)**: 512
173
+ - **Max Token Length**: 2,048
174
+ - **Max Prompt Token Length**: 2,048
175
+ - **Penalty Reward Value for Responses without an EOS Token**: -10.0
176
+ - **Response Length**: 1,024 (but 2,048 for MATH)
177
+ - **Total Episodes**: 100,000
178
+ - **KL penalty coefficient (beta)**: [0.1, 0.05, 0.03, 0.01]
179
+ - **Warm up ratio (omega)**: 0.0
180
+
181
+ ## License and use
182
+
183
+ All Llama 3.1 Tülu3 models are released under Meta's [Llama 3.1 Community License Agreement](https://www.llama.com/llama3_1/license/).
184
+ Llama 3.1 is licensed under the Llama 3.1 Community License, Copyright © Meta Platforms, Inc.
185
+ Tülu3 is intended for research and educational use.
186
+ For more information, please see our [Responsible Use Guidelines](https://allenai.org/responsible-use).
187
+
188
+ The models have been fine-tuned using a dataset mix with outputs generated from third party models and are subject to additional terms:
189
+ [Gemma Terms of Use](https://ai.google.dev/gemma/terms) and [Qwen License Agreement](https://huggingface.co/Qwen/Qwen2.5-72B-Instruct/blob/main/LICENSE) (models were improved using Qwen 2.5).
190
+
191
+
192
+ ## Citation
193
+
194
+ If Tülu3 or any of the related materials were helpful to your work, please cite:
195
+ ```
196
+ @article{lambert2024tulu3,
197
+ title = {Tülu 3: Pushing Frontiers in Open Language Model Post-Training},
198
+ author = {
199
+ Nathan Lambert and
200
+ Jacob Morrison and
201
+ Valentina Pyatkin and
202
+ Shengyi Huang and
203
+ Hamish Ivison and
204
+ Faeze Brahman and
205
+ Lester James V. Miranda and
206
+ Alisa Liu and
207
+ Nouha Dziri and
208
+ Shane Lyu and
209
+ Yuling Gu and
210
+ Saumya Malik and
211
+ Victoria Graf and
212
+ Jena D. Hwang and
213
+ Jiangjiang Yang and
214
+ Ronan Le Bras and
215
+ Oyvind Tafjord and
216
+ Chris Wilhelm and
217
+ Luca Soldaini and
218
+ Noah A. Smith and
219
+ Yizhong Wang and
220
+ Pradeep Dasigi and
221
+ Hannaneh Hajishirzi
222
+ },
223
+ year = {2024},
224
+ email = {[email protected]}
225
+ }
226
+ ```