Update README.md
Browse files
README.md
CHANGED
@@ -21,9 +21,9 @@ Doge uses `wsd_scheduler` as the training scheduler, which divides the learning
|
|
21 |
|
22 |
Here are the initial learning rates required to continue training at each checkpoint:
|
23 |
|
24 |
-
- **[Doge-20M](https://huggingface.co/
|
25 |
-
- **[Doge-60M](https://huggingface.co/
|
26 |
-
- **[Doge-160M](https://huggingface.co/
|
27 |
- **Doge-320M**: 2e-3
|
28 |
|
29 |
| Model | Learning Rate | Schedule | Warmup Steps | Stable Steps |
|
|
|
21 |
|
22 |
Here are the initial learning rates required to continue training at each checkpoint:
|
23 |
|
24 |
+
- **[Doge-20M](https://huggingface.co/SmallDoge/Doge-20M-checkpoint)**: 8e-3
|
25 |
+
- **[Doge-60M](https://huggingface.co/SmallDoge/Doge-60M-checkpoint)**: 6e-3
|
26 |
+
- **[Doge-160M](https://huggingface.co/SmallDoge/Doge-160M-checkpoint)**: 4e-3
|
27 |
- **Doge-320M**: 2e-3
|
28 |
|
29 |
| Model | Learning Rate | Schedule | Warmup Steps | Stable Steps |
|