README.md · s1lv3rj1nx/ch2 at main

metadata

license: apache-2.0
datasets:
  - sgoel9/paul_graham_essays

This is the trained model file for Ch2 - LLMs are MultiTask Learners. This chapter creates a GPT2-124M from scratch for text generation. Please use the best_model.pt checkpoint for inference. Since, we have pre-trained on a small amount of data, the model has overfitted, but can still generate sensible text.

Plots

Loss (Train):

Perplexity (Train):

Loss (Val):

Perplexixty (Val):