|
--- |
|
license: apache-2.0 |
|
datasets: |
|
- sgoel9/paul_graham_essays |
|
--- |
|
This is the trained model file for Ch2 - LLMs are MultiTask Learners. |
|
This chapter creates a GPT2-124M from scratch for text generation. Please use the `best_model.pt` checkpoint for inference. |
|
Since, we have pre-trained on a small amount of data, the model has overfitted, but can still generate sensible text. |
|
|
|
## Plots |
|
Loss (Train): |
|
|
|
 |
|
|
|
Perplexity (Train): |
|
 |
|
|
|
Loss (Val): |
|
|
|
 |
|
|
|
Perplexixty (Val): |
|
|
|
|
|
 |
|
|
|
|