Llama-3.1-8B-Instruct-PsyCourse-fold4

This model is a fine-tuned version of meta-llama/Llama-3.1-8B-Instruct on the course-train-fold4 dataset. It achieves the following results on the evaluation set:

Loss: 0.0344

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0001
train_batch_size: 1
eval_batch_size: 1
seed: 42
gradient_accumulation_steps: 16
total_train_batch_size: 16
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 5.0

Training results

Training Loss	Epoch	Step	Validation Loss
0.5681	0.0763	50	0.3814
0.122	0.1527	100	0.0796
0.0541	0.2290	150	0.0592
0.0502	0.3053	200	0.0522
0.0666	0.3816	250	0.0539
0.046	0.4580	300	0.0493
0.0458	0.5343	350	0.0527
0.0448	0.6106	400	0.0488
0.0567	0.6870	450	0.0462
0.0358	0.7633	500	0.0410
0.0445	0.8396	550	0.0407
0.0462	0.9159	600	0.0407
0.0363	0.9923	650	0.0410
0.0343	1.0686	700	0.0370
0.0413	1.1449	750	0.0378
0.0322	1.2213	800	0.0398
0.0342	1.2976	850	0.0385
0.0337	1.3739	900	0.0436
0.0295	1.4502	950	0.0373
0.0267	1.5266	1000	0.0386
0.0287	1.6029	1050	0.0380
0.0504	1.6792	1100	0.0388
0.0317	1.7556	1150	0.0391
0.0448	1.8319	1200	0.0366
0.0278	1.9082	1250	0.0362
0.0347	1.9845	1300	0.0344
0.0201	2.0609	1350	0.0355
0.0238	2.1372	1400	0.0357
0.0299	2.2135	1450	0.0371
0.0155	2.2899	1500	0.0384
0.0157	2.3662	1550	0.0391
0.0222	2.4425	1600	0.0370
0.0245	2.5188	1650	0.0360
0.0206	2.5952	1700	0.0376
0.0198	2.6715	1750	0.0363
0.0209	2.7478	1800	0.0370
0.026	2.8242	1850	0.0362
0.0197	2.9005	1900	0.0358
0.0291	2.9768	1950	0.0355
0.0091	3.0531	2000	0.0416
0.0132	3.1295	2050	0.0421
0.0115	3.2058	2100	0.0443
0.0131	3.2821	2150	0.0459
0.0132	3.3585	2200	0.0409
0.0077	3.4348	2250	0.0445
0.0156	3.5111	2300	0.0444
0.0125	3.5874	2350	0.0480
0.0089	3.6638	2400	0.0499
0.0125	3.7401	2450	0.0467
0.0115	3.8164	2500	0.0447
0.0062	3.8928	2550	0.0449
0.0112	3.9691	2600	0.0462
0.005	4.0454	2650	0.0465
0.0065	4.1217	2700	0.0502
0.0021	4.1981	2750	0.0543
0.0033	4.2744	2800	0.0556
0.0068	4.3507	2850	0.0572
0.0015	4.4271	2900	0.0599
0.0036	4.5034	2950	0.0602
0.0027	4.5797	3000	0.0615
0.0013	4.6560	3050	0.0615
0.0056	4.7324	3100	0.0618
0.0028	4.8087	3150	0.0618
0.0044	4.8850	3200	0.0620
0.0061	4.9614	3250	0.0622

Framework versions

PEFT 0.12.0
Transformers 4.46.1
Pytorch 2.5.1+cu124
Datasets 3.1.0
Tokenizers 0.20.3

chchen
/

Llama-3.1-8B-Instruct-PsyCourse-fold4

Llama-3.1-8B-Instruct-PsyCourse-fold4

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for chchen/Llama-3.1-8B-Instruct-PsyCourse-fold4

Evaluation results