parallel-gpt2-medium-wikitext

This model is a fine-tuned version of on an unknown dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 0.0001
train_batch_size: 64
eval_batch_size: 64
seed: 42
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
lr_scheduler_warmup_ratio: 0.1
num_epochs: 5

Training Loss	Epoch	Step	Validation Loss	Accuracy	Perplexity	Bleu
6.077	0.2806	500	5.9554	0.1870	385.8189	0.0352
5.1123	0.5612	1000	4.9836	0.2568	145.9931	0.0625
4.4123	0.8418	1500	4.3035	0.3159	73.9588	0.0843
4.0245	1.1223	2000	3.9678	0.3470	52.8693	0.1076
3.8298	1.4029	2500	3.7842	0.3630	44.0014	0.1166
3.7181	1.6835	3000	3.6620	0.3733	38.9404	0.1272
3.6123	1.9641	3500	3.5694	0.3818	35.4958	0.1311
3.4993	2.2447	4000	3.5029	0.3877	33.2118	0.1384
3.4358	2.5253	4500	3.4484	0.3930	31.4506	0.1358
3.4039	2.8058	5000	3.3989	0.3979	29.9323	0.1403
3.2908	3.0864	5500	3.3633	0.4018	28.8837	0.1409
3.2828	3.3670	6000	3.3326	0.4051	28.0103	0.1446
3.2606	3.6476	6500	3.3031	0.4081	27.1958	0.1457
3.234	3.9282	7000	3.2796	0.4106	26.5655	0.1433
3.1713	4.2088	7500	3.2621	0.4126	26.1045	0.1461
3.1314	4.4893	8000	3.2476	0.4145	25.7281	0.1455
3.1412	4.7699	8500	3.2350	0.4161	25.4075	0.1473

Safetensors

Model size

357M params

Tensor type

F32