ch2 / README.md
s1lv3rj1nx's picture
Create README.md
88345c9 verified
---
license: apache-2.0
datasets:
- sgoel9/paul_graham_essays
---
This is the trained model file for Ch2 - LLMs are MultiTask Learners.
This chapter creates a GPT2-124M from scratch for text generation. Please use the `best_model.pt` checkpoint for inference.
Since, we have pre-trained on a small amount of data, the model has overfitted, but can still generate sensible text.
## Plots
Loss (Train):
![ch2_05_train_epoch_loss.png](https://cdn-uploads.huggingface.co/production/uploads/62790519541f3d2dfa79a6cb/Ht1Tfjuoqywbf5GF06jMx.png)
Perplexity (Train):
![image/png](https://cdn-uploads.huggingface.co/production/uploads/62790519541f3d2dfa79a6cb/psCddxI08z64FKzPH3ADk.png)
Loss (Val):
![image/png](https://cdn-uploads.huggingface.co/production/uploads/62790519541f3d2dfa79a6cb/Ul5sRV2g0HT2CTCU1FQBT.png)
Perplexixty (Val):
![image/png](https://cdn-uploads.huggingface.co/production/uploads/62790519541f3d2dfa79a6cb/TmZ6cn7g48q3sAjgsECI5.png)