Small-scale proxies for large-scale Transformer training instabilities Paper • 2309.14322 • Published Sep 25, 2023 • 20
Scaling Laws for Sparsely-Connected Foundation Models Paper • 2309.08520 • Published Sep 15, 2023 • 13