Commit
·
4295467
1
Parent(s):
528b56b
Create README.md
Browse files
README.md
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
|
2 |
+
|
3 |
+
Ablation whether `beta2=0.95` is better than `beta2=0.999`. The answer is yes, beta2=0.95 is more stable and leads to slightly lower loss as seen by comparing the tensorboard of the this model with the normal lm1-8b7-12b.
|