s1lv3rj1nx
/

ch2

Model card Files Files and versions Community

ch2 / README.md

s1lv3rj1nx's picture

Create README.md

88345c9 verified about 1 month ago

|

history blame contribute delete

958 Bytes

	---
	license: apache-2.0
	datasets:
	- sgoel9/paul_graham_essays
	---
	This is the trained model file for Ch2 - LLMs are MultiTask Learners.
	This chapter creates a GPT2-124M from scratch for text generation. Please use the `best_model.pt` checkpoint for inference.
	Since, we have pre-trained on a small amount of data, the model has overfitted, but can still generate sensible text.

	## Plots
	Loss (Train):

	![ch2_05_train_epoch_loss.png](https://cdn-uploads.huggingface.co/production/uploads/62790519541f3d2dfa79a6cb/Ht1Tfjuoqywbf5GF06jMx.png)

	Perplexity (Train):
	![image/png](https://cdn-uploads.huggingface.co/production/uploads/62790519541f3d2dfa79a6cb/psCddxI08z64FKzPH3ADk.png)

	Loss (Val):

	![image/png](https://cdn-uploads.huggingface.co/production/uploads/62790519541f3d2dfa79a6cb/Ul5sRV2g0HT2CTCU1FQBT.png)

	Perplexixty (Val):


	![image/png](https://cdn-uploads.huggingface.co/production/uploads/62790519541f3d2dfa79a6cb/TmZ6cn7g48q3sAjgsECI5.png)