---
license: apache-2.0
datasets:
- sgoel9/paul_graham_essays
---
This is the trained model file for Ch2 - LLMs are MultiTask Learners. 
This chapter creates a GPT2-124M from scratch for text generation. Please use the `best_model.pt` checkpoint for inference. 
Since, we have pre-trained on a small amount of data, the model has overfitted, but can still generate sensible text.

## Plots
Loss  (Train):

![ch2_05_train_epoch_loss.png](https://cdn-uploads.huggingface.co/production/uploads/62790519541f3d2dfa79a6cb/Ht1Tfjuoqywbf5GF06jMx.png)

Perplexity (Train):
![image/png](https://cdn-uploads.huggingface.co/production/uploads/62790519541f3d2dfa79a6cb/psCddxI08z64FKzPH3ADk.png)

Loss (Val):

![image/png](https://cdn-uploads.huggingface.co/production/uploads/62790519541f3d2dfa79a6cb/Ul5sRV2g0HT2CTCU1FQBT.png)

Perplexixty (Val):


![image/png](https://cdn-uploads.huggingface.co/production/uploads/62790519541f3d2dfa79a6cb/TmZ6cn7g48q3sAjgsECI5.png)