Llama-3.1-8B-Instruct-PsyCourse-fold3

This model is a fine-tuned version of meta-llama/Llama-3.1-8B-Instruct on the course-train-fold3 dataset. It achieves the following results on the evaluation set:

Loss: 0.0352

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0001
train_batch_size: 1
eval_batch_size: 1
seed: 42
gradient_accumulation_steps: 16
total_train_batch_size: 16
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 5.0

Training results

Training Loss	Epoch	Step	Validation Loss
0.5742	0.0753	50	0.4037
0.0981	0.1505	100	0.0890
0.0775	0.2258	150	0.0663
0.075	0.3011	200	0.0574
0.0587	0.3763	250	0.0533
0.0617	0.4516	300	0.0547
0.0431	0.5269	350	0.0519
0.0573	0.6021	400	0.0479
0.0504	0.6774	450	0.0438
0.0341	0.7527	500	0.0428
0.0448	0.8279	550	0.0440
0.0373	0.9032	600	0.0414
0.0369	0.9785	650	0.0414
0.0266	1.0537	700	0.0422
0.0337	1.1290	750	0.0380
0.0379	1.2043	800	0.0424
0.0297	1.2795	850	0.0413
0.0417	1.3548	900	0.0389
0.0342	1.4300	950	0.0393
0.033	1.5053	1000	0.0387
0.0304	1.5806	1050	0.0412
0.0225	1.6558	1100	0.0380
0.0406	1.7311	1150	0.0359
0.0314	1.8064	1200	0.0378
0.0345	1.8816	1250	0.0352
0.0314	1.9569	1300	0.0352
0.0232	2.0322	1350	0.0370
0.0298	2.1074	1400	0.0358
0.0224	2.1827	1450	0.0376
0.0251	2.2580	1500	0.0403
0.0303	2.3332	1550	0.0377
0.0174	2.4085	1600	0.0399
0.02	2.4838	1650	0.0393
0.0239	2.5590	1700	0.0386
0.0377	2.6343	1750	0.0377
0.0266	2.7096	1800	0.0373
0.0229	2.7848	1850	0.0356
0.0257	2.8601	1900	0.0409
0.021	2.9354	1950	0.0365
0.0137	3.0106	2000	0.0382
0.0119	3.0859	2050	0.0439
0.0116	3.1612	2100	0.0427
0.0131	3.2364	2150	0.0435
0.0132	3.3117	2200	0.0436
0.0095	3.3870	2250	0.0448
0.0101	3.4622	2300	0.0486
0.0068	3.5375	2350	0.0472
0.0133	3.6128	2400	0.0447
0.0155	3.6880	2450	0.0423
0.0118	3.7633	2500	0.0446
0.0104	3.8386	2550	0.0464
0.0149	3.9138	2600	0.0434
0.0126	3.9891	2650	0.0439
0.0066	4.0644	2700	0.0464
0.0048	4.1396	2750	0.0502
0.0052	4.2149	2800	0.0543
0.0051	4.2901	2850	0.0537
0.0102	4.3654	2900	0.0547
0.0052	4.4407	2950	0.0546
0.0029	4.5159	3000	0.0548
0.0085	4.5912	3050	0.0552
0.0049	4.6665	3100	0.0551
0.0054	4.7417	3150	0.0553
0.0035	4.8170	3200	0.0553
0.0041	4.8923	3250	0.0554
0.0045	4.9675	3300	0.0553

Framework versions

PEFT 0.12.0
Transformers 4.46.1
Pytorch 2.5.1+cu124
Datasets 3.1.0
Tokenizers 0.20.3

chchen
/

Llama-3.1-8B-Instruct-PsyCourse-fold3

Llama-3.1-8B-Instruct-PsyCourse-fold3

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for chchen/Llama-3.1-8B-Instruct-PsyCourse-fold3

Evaluation results