Ahmadzei's picture
added 3 more tables for large emb model
5fa1a76
However, if you use gradient accumulation with bf16, gradients are accumulated in bf16 which may not be desired because this format's low precision can lead to lossy accumulation.