shivanandmn's picture
Model save
c35ba4f verified
|
raw
history blame
4.54 kB
metadata
library_name: transformers
tags:
  - generated_from_trainer
metrics:
  - accuracy
  - bleu
model-index:
  - name: parallel-gpt2-medium-wikitext
    results: []

parallel-gpt2-medium-wikitext

This model is a fine-tuned version of on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 3.1010
  • Accuracy: 0.4274
  • Perplexity: 22.2205
  • Bleu: 0.1461

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 16
  • eval_batch_size: 16
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 32
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 5

Training results

Training Loss Epoch Step Validation Loss Accuracy Perplexity Bleu
6.4455 0.1404 500 6.3313 0.1766 561.8647 0.0257
5.7254 0.2807 1000 5.6235 0.2136 276.8543 0.0454
5.1084 0.4211 1500 4.9822 0.2576 145.7898 0.0649
4.5994 0.5614 2000 4.5052 0.2929 90.4901 0.0741
4.2338 0.7018 2500 4.1378 0.3273 62.6674 0.0937
3.9975 0.8421 3000 3.9286 0.3465 50.8364 0.1031
3.8648 0.9825 3500 3.7926 0.3583 44.3697 0.1166
3.7164 1.1227 4000 3.6987 0.3667 40.3929 0.1226
3.6639 1.2630 4500 3.6221 0.3734 37.4157 0.1282
3.582 1.4034 5000 3.5575 0.3796 35.0763 0.1277
3.5315 1.5437 5500 3.5064 0.3840 33.3276 0.1312
3.5025 1.6841 6000 3.4594 0.3881 31.7989 0.1366
3.4462 1.8244 6500 3.4208 0.3919 30.5952 0.1310
3.4167 1.9648 7000 3.3863 0.3956 29.5564 0.1355
3.2967 2.1050 7500 3.3548 0.3989 28.6395 0.1317
3.2909 2.2453 8000 3.3290 0.4015 27.9115 0.1381
3.2593 2.3857 8500 3.3044 0.4039 27.2323 0.1422
3.2408 2.5260 9000 3.2826 0.4061 26.6448 0.1412
3.2278 2.6664 9500 3.2592 0.4090 26.0285 0.1436
3.2172 2.8067 10000 3.2415 0.4105 25.5733 0.1412
3.2145 2.9471 10500 3.2227 0.4125 25.0946 0.1402
3.0749 3.0873 11000 3.2099 0.4143 24.7768 0.1413
3.0777 3.2276 11500 3.1978 0.4160 24.4784 0.1420
3.0743 3.368 12000 3.1855 0.4174 24.1797 0.1438
3.0679 3.5084 12500 3.1735 0.4183 23.8912 0.1397
3.0635 3.6487 13000 3.1599 0.4200 23.5691 0.1423
3.0262 3.7891 13500 3.1489 0.4211 23.3095 0.1432
3.0382 3.9294 14000 3.1397 0.4223 23.0970 0.1461
2.9525 4.0696 14500 3.1335 0.4233 22.9539 0.1457
2.9621 4.2100 15000 3.1270 0.4239 22.8057 0.1454
2.9422 4.3503 15500 3.1211 0.4250 22.6718 0.1468
2.9224 4.4907 16000 3.1149 0.4257 22.5322 0.1454
2.9475 4.6310 16500 3.1084 0.4264 22.3862 0.1497
2.9318 4.7714 17000 3.1041 0.4270 22.2899 0.1468
2.9268 4.9117 17500 3.1010 0.4274 22.2205 0.1461

Framework versions

  • Transformers 4.49.0
  • Pytorch 2.6.0+cu124
  • Datasets 3.3.2
  • Tokenizers 0.21.0