lm1-8b7-12b-beta / README.md
Muennighoff's picture
Create README.md
4295467

Ablation whether beta2=0.95 is better than beta2=0.999. The answer is yes, beta2=0.95 is more stable and leads to slightly lower loss as seen by comparing the tensorboard of the this model with the normal lm1-8b7-12b.