dfurman commited on
Commit
f63b23e
·
1 Parent(s): dff6033

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +47 -47
README.md CHANGED
@@ -44,53 +44,6 @@ This instruction-following llm was built via parameter-efficient QLoRA finetunin
44
 
45
  We use Eleuther.AI's [Language Model Evaluation Harness](https://github.com/EleutherAI/lm-evaluation-harness) to run the benchmark tests above, the same version as Hugging Face's [Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard).
46
 
47
- ## Training
48
-
49
- It took ~1 hour to train 1 epoch on 1x A100.
50
-
51
- Prompt format:
52
- This model (and all my future releases) uses the [ChatML](https://huggingface.co/docs/transformers/chat_templating#what-template-should-i-use) prompt format, which was developed by OpenAI.
53
-
54
- ```
55
- <|im_start|>system
56
- You are a helpful assistant.<|im_end|>
57
- <|im_start|>user
58
- {prompt}<|im_end|>
59
- <|im_start|>assistant
60
- ```
61
-
62
- ### Training Hyperparameters
63
-
64
-
65
- We use the [SFTTrainer](https://huggingface.co/docs/trl/main/en/sft_trainer) from `trl` to fine-tune llms on instruction-following datasets.
66
-
67
- The following `TrainingArguments` config was used:
68
-
69
- - num_train_epochs = 1
70
- - auto_find_batch_size = True
71
- - gradient_accumulation_steps = 1
72
- - optim = "paged_adamw_32bit"
73
- - save_strategy = "epoch"
74
- - learning_rate = 3e-4
75
- - lr_scheduler_type = "cosine"
76
- - warmup_ratio = 0.03
77
- - logging_strategy = "steps"
78
- - logging_steps = 25
79
- - bf16 = True
80
-
81
- The following `bitsandbytes` quantization config was used:
82
-
83
- - quant_method: bitsandbytes
84
- - load_in_8bit: False
85
- - load_in_4bit: True
86
- - llm_int8_threshold: 6.0
87
- - llm_int8_skip_modules: None
88
- - llm_int8_enable_fp32_cpu_offload: False
89
- - llm_int8_has_fp16_weight: False
90
- - bnb_4bit_quant_type: nf4
91
- - bnb_4bit_use_double_quant: False
92
- - bnb_4bit_compute_dtype: bfloat16
93
-
94
  ## How to Get Started with the Model
95
 
96
  Use the code below to get started with the model.
@@ -180,6 +133,53 @@ Remember, when writing emails, always keep in mind your audience and their prefe
180
  |:-----------------------------:|:----------------------:|:---------------------:|:-------------:|:-----------------------:|
181
  | 3.1 | 1x A100 (40 GB SXM) | torch | fp16 | 13 |
182
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
183
 
184
  ## Model Card Contact
185
 
 
44
 
45
  We use Eleuther.AI's [Language Model Evaluation Harness](https://github.com/EleutherAI/lm-evaluation-harness) to run the benchmark tests above, the same version as Hugging Face's [Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard).
46
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
47
  ## How to Get Started with the Model
48
 
49
  Use the code below to get started with the model.
 
133
  |:-----------------------------:|:----------------------:|:---------------------:|:-------------:|:-----------------------:|
134
  | 3.1 | 1x A100 (40 GB SXM) | torch | fp16 | 13 |
135
 
136
+ ## Training
137
+
138
+ It took ~1 hour to train 1 epoch on 1x A100.
139
+
140
+ Prompt format:
141
+ This model (and all my future releases) uses the [ChatML](https://huggingface.co/docs/transformers/chat_templating#what-template-should-i-use) prompt format, which was developed by OpenAI.
142
+
143
+ ```
144
+ <|im_start|>system
145
+ You are a helpful assistant.<|im_end|>
146
+ <|im_start|>user
147
+ {prompt}<|im_end|>
148
+ <|im_start|>assistant
149
+ ```
150
+
151
+ ### Training Hyperparameters
152
+
153
+
154
+ We use the [SFTTrainer](https://huggingface.co/docs/trl/main/en/sft_trainer) from `trl` to fine-tune llms on instruction-following datasets.
155
+
156
+ The following `TrainingArguments` config was used:
157
+
158
+ - num_train_epochs = 1
159
+ - auto_find_batch_size = True
160
+ - gradient_accumulation_steps = 1
161
+ - optim = "paged_adamw_32bit"
162
+ - save_strategy = "epoch"
163
+ - learning_rate = 3e-4
164
+ - lr_scheduler_type = "cosine"
165
+ - warmup_ratio = 0.03
166
+ - logging_strategy = "steps"
167
+ - logging_steps = 25
168
+ - bf16 = True
169
+
170
+ The following `bitsandbytes` quantization config was used:
171
+
172
+ - quant_method: bitsandbytes
173
+ - load_in_8bit: False
174
+ - load_in_4bit: True
175
+ - llm_int8_threshold: 6.0
176
+ - llm_int8_skip_modules: None
177
+ - llm_int8_enable_fp32_cpu_offload: False
178
+ - llm_int8_has_fp16_weight: False
179
+ - bnb_4bit_quant_type: nf4
180
+ - bnb_4bit_use_double_quant: False
181
+ - bnb_4bit_compute_dtype: bfloat16
182
+
183
 
184
  ## Model Card Contact
185