oh_scale_x.25_compute_equal

This model is a fine-tuned version of meta-llama/Meta-Llama-3.1-8B on the mlfoundations-dev/oh-dcft-v1.3_no-curation_gpt-4o-mini_scale_0.25x dataset. It achieves the following results on the evaluation set:

Loss: 2.3639

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-06
train_batch_size: 8
eval_batch_size: 8
seed: 42
distributed_type: multi-GPU
num_devices: 8
gradient_accumulation_steps: 8
total_train_batch_size: 512
total_eval_batch_size: 64
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: constant
num_epochs: 46.0

Training results

Training Loss	Epoch	Step	Validation Loss
0.8194	0.9986	88	0.8081
0.7435	1.9972	176	0.7863
0.6821	2.9957	264	0.7852
0.6325	3.9943	352	0.7986
0.5795	4.9929	440	0.8202
0.5193	5.9915	528	0.8596
0.4751	6.9901	616	0.9139
0.4221	8.0	705	1.0006
0.3649	8.9986	793	1.0596
0.3192	9.9972	881	1.1392
0.2658	10.9957	969	1.2517
0.2232	11.9943	1057	1.3438
0.1817	12.9929	1145	1.4416
0.1418	13.9915	1233	1.5400
0.1144	14.9901	1321	1.6749
0.0932	16.0	1410	1.7733
0.0669	16.9986	1498	1.9060
0.0506	17.9972	1586	1.9451
0.0412	18.9957	1674	2.0182
0.0336	19.9943	1762	2.0949
0.0299	20.9929	1850	2.1437
0.0255	21.9915	1938	2.1744
0.0214	22.9901	2026	2.2531
0.0183	24.0	2115	2.2672
0.0176	24.9986	2203	2.2650
0.0165	25.9972	2291	2.2785
0.0152	26.9957	2379	2.2726
0.0141	27.9943	2467	2.3100
0.0124	28.9929	2555	2.3323
0.0106	29.9915	2643	2.3571
0.0091	30.9901	2731	2.4116
0.0083	32.0	2820	2.5119
0.0071	32.9986	2908	2.4599
0.0066	33.9972	2996	2.4769
0.0061	34.9957	3084	2.4637
0.0059	35.9943	3172	2.4468
0.0058	36.9929	3260	2.4387
0.0056	37.9915	3348	2.4091
0.0056	38.9901	3436	2.4180
0.0057	40.0	3525	2.4193
0.0057	40.9986	3613	2.4677
0.0058	41.9972	3701	2.3642
0.0058	42.9957	3789	2.4231
0.0059	43.9943	3877	2.4137
0.0057	44.9929	3965	2.4166
0.0056	45.9348	4048	2.3639

Framework versions

Transformers 4.46.1
Pytorch 2.3.0
Datasets 3.1.0
Tokenizers 0.20.3

mlfoundations-dev
/

oh_scale_x.25_compute_equal

oh_scale_x.25_compute_equal

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for mlfoundations-dev/oh_scale_x.25_compute_equal

Evaluation results