ch2 / README.md
s1lv3rj1nx's picture
Create README.md
88345c9 verified
metadata
license: apache-2.0
datasets:
  - sgoel9/paul_graham_essays

This is the trained model file for Ch2 - LLMs are MultiTask Learners. This chapter creates a GPT2-124M from scratch for text generation. Please use the best_model.pt checkpoint for inference. Since, we have pre-trained on a small amount of data, the model has overfitted, but can still generate sensible text.

Plots

Loss (Train):

ch2_05_train_epoch_loss.png

Perplexity (Train): image/png

Loss (Val):

image/png

Perplexixty (Val):

image/png