vwxyzjn commited on
Commit
15e3f80
·
verified ·
1 Parent(s): 031bfd6

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -151,8 +151,8 @@ See the Falcon 180B model card for an example of this.
151
  ## Hyperparamters
152
 
153
  SFT:
154
- - **Learning Rate**: 5E-6 (8B), 2E-6 (70B)
155
- - **Effective Batch Size:** 128
156
  - **Max. Sequence Length:** 4096
157
  - **Loss Accumulation:** Sum (see https://unsloth.ai/blog/gradient)
158
  - **Learning Rate Schedule:** Linear
 
151
  ## Hyperparamters
152
 
153
  SFT:
154
+ - **Learning Rate**: 5E-6 (8B), 2E-6 (70B, 405B)
155
+ - **Effective Batch Size:** 128 (8B, 70B), 256 (405B)
156
  - **Max. Sequence Length:** 4096
157
  - **Loss Accumulation:** Sum (see https://unsloth.ai/blog/gradient)
158
  - **Learning Rate Schedule:** Linear