dd-gpt2-medium-wikitext

This model is a fine-tuned version of on an unknown dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 0.0001
train_batch_size: 64
eval_batch_size: 64
seed: 42
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
lr_scheduler_warmup_ratio: 0.1
num_epochs: 5

Training Loss	Epoch	Step	Validation Loss	Accuracy	Perplexity	Bleu
6.3499	0.2806	500	6.2328	0.1688	509.1785	0.0261
5.4979	0.5612	1000	5.3734	0.2228	215.6041	0.0506
4.8996	0.8418	1500	4.7975	0.2650	121.2067	0.0669
4.5102	1.1223	2000	4.4042	0.2992	81.7968	0.0791
4.2029	1.4029	2500	4.1110	0.3301	61.0070	0.0887
4.0332	1.6835	3000	3.9383	0.3457	51.3319	0.0996
3.8911	1.9641	3500	3.8146	0.3575	45.3566	0.1107
3.7698	2.2447	4000	3.7189	0.3663	41.2194	0.1154
3.6812	2.5253	4500	3.6449	0.3729	38.2808	0.1225
3.63	2.8058	5000	3.5815	0.3790	35.9274	0.1216
3.5287	3.0864	5500	3.5309	0.3840	34.1532	0.1261
3.5032	3.3670	6000	3.4913	0.3883	32.8286	0.1302
3.4684	3.6476	6500	3.4542	0.3917	31.6327	0.1304
3.4365	3.9282	7000	3.4250	0.3949	30.7240	0.1303
3.3894	4.2088	7500	3.4020	0.3973	30.0227	0.1327
3.3446	4.4893	8000	3.3850	0.3992	29.5189	0.1336
3.3532	4.7699	8500	3.3729	0.4006	29.1627	0.1356

Safetensors

Model size

355M params

Tensor type

F32