Muennighoff commited on
Commit
4295467
·
1 Parent(s): 528b56b

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -0
README.md ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+
2
+
3
+ Ablation whether `beta2=0.95` is better than `beta2=0.999`. The answer is yes, beta2=0.95 is more stable and leads to slightly lower loss as seen by comparing the tensorboard of the this model with the normal lm1-8b7-12b.