metadata

library_name: transformers
tags:
  - generated_from_trainer
metrics:
  - accuracy
  - bleu
model-index:
  - name: parallel-gpt2-medium-wikitext
    results: []

parallel-gpt2-medium-wikitext

This model is a fine-tuned version of on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 3.1010
Accuracy: 0.4274
Perplexity: 22.2205
Bleu: 0.1461

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0001
train_batch_size: 16
eval_batch_size: 16
seed: 42
gradient_accumulation_steps: 2
total_train_batch_size: 32
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
lr_scheduler_warmup_ratio: 0.1
num_epochs: 5

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy	Perplexity	Bleu
6.4455	0.1404	500	6.3313	0.1766	561.8647	0.0257
5.7254	0.2807	1000	5.6235	0.2136	276.8543	0.0454
5.1084	0.4211	1500	4.9822	0.2576	145.7898	0.0649
4.5994	0.5614	2000	4.5052	0.2929	90.4901	0.0741
4.2338	0.7018	2500	4.1378	0.3273	62.6674	0.0937
3.9975	0.8421	3000	3.9286	0.3465	50.8364	0.1031
3.8648	0.9825	3500	3.7926	0.3583	44.3697	0.1166
3.7164	1.1227	4000	3.6987	0.3667	40.3929	0.1226
3.6639	1.2630	4500	3.6221	0.3734	37.4157	0.1282
3.582	1.4034	5000	3.5575	0.3796	35.0763	0.1277
3.5315	1.5437	5500	3.5064	0.3840	33.3276	0.1312
3.5025	1.6841	6000	3.4594	0.3881	31.7989	0.1366
3.4462	1.8244	6500	3.4208	0.3919	30.5952	0.1310
3.4167	1.9648	7000	3.3863	0.3956	29.5564	0.1355
3.2967	2.1050	7500	3.3548	0.3989	28.6395	0.1317
3.2909	2.2453	8000	3.3290	0.4015	27.9115	0.1381
3.2593	2.3857	8500	3.3044	0.4039	27.2323	0.1422
3.2408	2.5260	9000	3.2826	0.4061	26.6448	0.1412
3.2278	2.6664	9500	3.2592	0.4090	26.0285	0.1436
3.2172	2.8067	10000	3.2415	0.4105	25.5733	0.1412
3.2145	2.9471	10500	3.2227	0.4125	25.0946	0.1402
3.0749	3.0873	11000	3.2099	0.4143	24.7768	0.1413
3.0777	3.2276	11500	3.1978	0.4160	24.4784	0.1420
3.0743	3.368	12000	3.1855	0.4174	24.1797	0.1438
3.0679	3.5084	12500	3.1735	0.4183	23.8912	0.1397
3.0635	3.6487	13000	3.1599	0.4200	23.5691	0.1423
3.0262	3.7891	13500	3.1489	0.4211	23.3095	0.1432
3.0382	3.9294	14000	3.1397	0.4223	23.0970	0.1461
2.9525	4.0696	14500	3.1335	0.4233	22.9539	0.1457
2.9621	4.2100	15000	3.1270	0.4239	22.8057	0.1454
2.9422	4.3503	15500	3.1211	0.4250	22.6718	0.1468
2.9224	4.4907	16000	3.1149	0.4257	22.5322	0.1454
2.9475	4.6310	16500	3.1084	0.4264	22.3862	0.1497
2.9318	4.7714	17000	3.1041	0.4270	22.2899	0.1468
2.9268	4.9117	17500	3.1010	0.4274	22.2205	0.1461

Framework versions

Transformers 4.49.0
Pytorch 2.6.0+cu124
Datasets 3.3.2
Tokenizers 0.21.0