--- license: apache-2.0 datasets: - sgoel9/paul_graham_essays --- This is the trained model file for Ch2 - LLMs are MultiTask Learners. This chapter creates a GPT2-124M from scratch for text generation. Please use the `best_model.pt` checkpoint for inference. Since, we have pre-trained on a small amount of data, the model has overfitted, but can still generate sensible text. ## Plots Loss (Train): ![ch2_05_train_epoch_loss.png](https://cdn-uploads.huggingface.co/production/uploads/62790519541f3d2dfa79a6cb/Ht1Tfjuoqywbf5GF06jMx.png) Perplexity (Train): ![image/png](https://cdn-uploads.huggingface.co/production/uploads/62790519541f3d2dfa79a6cb/psCddxI08z64FKzPH3ADk.png) Loss (Val): ![image/png](https://cdn-uploads.huggingface.co/production/uploads/62790519541f3d2dfa79a6cb/Ul5sRV2g0HT2CTCU1FQBT.png) Perplexixty (Val): ![image/png](https://cdn-uploads.huggingface.co/production/uploads/62790519541f3d2dfa79a6cb/TmZ6cn7g48q3sAjgsECI5.png)