Update README.md
Browse files
README.md
CHANGED
@@ -246,8 +246,8 @@ See the Falcon 180B model card for an example of this.
|
|
246 |
## Hyperparamters
|
247 |
|
248 |
SFT:
|
249 |
-
- **Learning Rate**: 5E-6 (8B), 2E-6 (70B)
|
250 |
-
- **Effective Batch Size:** 128
|
251 |
- **Max. Sequence Length:** 4096
|
252 |
- **Loss Accumulation:** Sum (see https://unsloth.ai/blog/gradient)
|
253 |
- **Learning Rate Schedule:** Linear
|
|
|
246 |
## Hyperparamters
|
247 |
|
248 |
SFT:
|
249 |
+
- **Learning Rate**: 5E-6 (8B), 2E-6 (70B, 405B)
|
250 |
+
- **Effective Batch Size:** 128 (8B, 70B), 256 (405B)
|
251 |
- **Max. Sequence Length:** 4096
|
252 |
- **Loss Accumulation:** Sum (see https://unsloth.ai/blog/gradient)
|
253 |
- **Learning Rate Schedule:** Linear
|