Ahmadzei's picture
added 3 more tables for large emb model
5fa1a76
If you start getting loss=NaN or the model inhibits some other abnormal behavior due to inf or nan in
activations or weights one needs to discover where the first underflow or overflow happens and what led to it.