SmallDoge
/

Doge-160M-checkpoint

Text Generation

Model card Files Files and versions Community

JingzeShi commited on Jan 31

Commit

533655e

·

verified ·

1 Parent(s): 0fce2c3

Update README.md

Files changed (1) hide show

README.md +3 -3

README.md CHANGED Viewed

@@ -21,9 +21,9 @@ Doge uses `wsd_scheduler` as the training scheduler, which divides the learning
 Here are the initial learning rates required to continue training at each checkpoint:
-- **[Doge-20M](https://huggingface.co/JingzeShi/Doge-20M-checkpoint)**: 8e-3
-- **[Doge-60M](https://huggingface.co/JingzeShi/Doge-60M-checkpoint)**: 6e-3
-- **[Doge-160M](https://huggingface.co/JingzeShi/Doge-160M-checkpoint)**: 4e-3
 - **Doge-320M**: 2e-3
 | Model | Learning Rate | Schedule | Warmup Steps | Stable Steps |

 Here are the initial learning rates required to continue training at each checkpoint:
+- **[Doge-20M](https://huggingface.co/SmallDoge/Doge-20M-checkpoint)**: 8e-3
+- **[Doge-60M](https://huggingface.co/SmallDoge/Doge-60M-checkpoint)**: 6e-3
+- **[Doge-160M](https://huggingface.co/SmallDoge/Doge-160M-checkpoint)**: 4e-3
 - **Doge-320M**: 2e-3
 | Model | Learning Rate | Schedule | Warmup Steps | Stable Steps |