Training finished (float32, Xavier Init), epochs=3, lr=1e-07, clip=1, batch_size=4, loss=8.6898, perplexity=nan 491e7bc verified Vishwas1 commited on Feb 12
Training finished (float32, no LayerNorm), epochs=3, lr=1e-07, clip=1, batch_size=4, loss=9.0133, perplexity=nan d7496e8 verified Vishwas1 commited on Feb 12