Llama-3.1-8B-Instruct-PsyCourse-fold3

This model is a fine-tuned version of meta-llama/Llama-3.1-8B-Instruct on the course-train-fold3 dataset. It achieves the following results on the evaluation set:

  • Loss: 0.0352

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 1
  • eval_batch_size: 1
  • seed: 42
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 16
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 5.0

Training results

Training Loss Epoch Step Validation Loss
0.5742 0.0753 50 0.4037
0.0981 0.1505 100 0.0890
0.0775 0.2258 150 0.0663
0.075 0.3011 200 0.0574
0.0587 0.3763 250 0.0533
0.0617 0.4516 300 0.0547
0.0431 0.5269 350 0.0519
0.0573 0.6021 400 0.0479
0.0504 0.6774 450 0.0438
0.0341 0.7527 500 0.0428
0.0448 0.8279 550 0.0440
0.0373 0.9032 600 0.0414
0.0369 0.9785 650 0.0414
0.0266 1.0537 700 0.0422
0.0337 1.1290 750 0.0380
0.0379 1.2043 800 0.0424
0.0297 1.2795 850 0.0413
0.0417 1.3548 900 0.0389
0.0342 1.4300 950 0.0393
0.033 1.5053 1000 0.0387
0.0304 1.5806 1050 0.0412
0.0225 1.6558 1100 0.0380
0.0406 1.7311 1150 0.0359
0.0314 1.8064 1200 0.0378
0.0345 1.8816 1250 0.0352
0.0314 1.9569 1300 0.0352
0.0232 2.0322 1350 0.0370
0.0298 2.1074 1400 0.0358
0.0224 2.1827 1450 0.0376
0.0251 2.2580 1500 0.0403
0.0303 2.3332 1550 0.0377
0.0174 2.4085 1600 0.0399
0.02 2.4838 1650 0.0393
0.0239 2.5590 1700 0.0386
0.0377 2.6343 1750 0.0377
0.0266 2.7096 1800 0.0373
0.0229 2.7848 1850 0.0356
0.0257 2.8601 1900 0.0409
0.021 2.9354 1950 0.0365
0.0137 3.0106 2000 0.0382
0.0119 3.0859 2050 0.0439
0.0116 3.1612 2100 0.0427
0.0131 3.2364 2150 0.0435
0.0132 3.3117 2200 0.0436
0.0095 3.3870 2250 0.0448
0.0101 3.4622 2300 0.0486
0.0068 3.5375 2350 0.0472
0.0133 3.6128 2400 0.0447
0.0155 3.6880 2450 0.0423
0.0118 3.7633 2500 0.0446
0.0104 3.8386 2550 0.0464
0.0149 3.9138 2600 0.0434
0.0126 3.9891 2650 0.0439
0.0066 4.0644 2700 0.0464
0.0048 4.1396 2750 0.0502
0.0052 4.2149 2800 0.0543
0.0051 4.2901 2850 0.0537
0.0102 4.3654 2900 0.0547
0.0052 4.4407 2950 0.0546
0.0029 4.5159 3000 0.0548
0.0085 4.5912 3050 0.0552
0.0049 4.6665 3100 0.0551
0.0054 4.7417 3150 0.0553
0.0035 4.8170 3200 0.0553
0.0041 4.8923 3250 0.0554
0.0045 4.9675 3300 0.0553

Framework versions

  • PEFT 0.12.0
  • Transformers 4.46.1
  • Pytorch 2.5.1+cu124
  • Datasets 3.1.0
  • Tokenizers 0.20.3
Downloads last month
0
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for chchen/Llama-3.1-8B-Instruct-PsyCourse-fold3

Adapter
(819)
this model