metadata
license: apache-2.0
datasets:
- sgoel9/paul_graham_essays
This is the trained model file for Ch2 - LLMs are MultiTask Learners.
This chapter creates a GPT2-124M from scratch for text generation. Please use the best_model.pt
checkpoint for inference.
Since, we have pre-trained on a small amount of data, the model has overfitted, but can still generate sensible text.
Plots
Loss (Train):
Loss (Val):
Perplexixty (Val):