oh_scale_x.25_compute_equal

This model is a fine-tuned version of meta-llama/Meta-Llama-3.1-8B on the mlfoundations-dev/oh-dcft-v1.3_no-curation_gpt-4o-mini_scale_0.25x dataset. It achieves the following results on the evaluation set:

  • Loss: 2.3639

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-06
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 8
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 512
  • total_eval_batch_size: 64
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: constant
  • num_epochs: 46.0

Training results

Training Loss Epoch Step Validation Loss
0.8194 0.9986 88 0.8081
0.7435 1.9972 176 0.7863
0.6821 2.9957 264 0.7852
0.6325 3.9943 352 0.7986
0.5795 4.9929 440 0.8202
0.5193 5.9915 528 0.8596
0.4751 6.9901 616 0.9139
0.4221 8.0 705 1.0006
0.3649 8.9986 793 1.0596
0.3192 9.9972 881 1.1392
0.2658 10.9957 969 1.2517
0.2232 11.9943 1057 1.3438
0.1817 12.9929 1145 1.4416
0.1418 13.9915 1233 1.5400
0.1144 14.9901 1321 1.6749
0.0932 16.0 1410 1.7733
0.0669 16.9986 1498 1.9060
0.0506 17.9972 1586 1.9451
0.0412 18.9957 1674 2.0182
0.0336 19.9943 1762 2.0949
0.0299 20.9929 1850 2.1437
0.0255 21.9915 1938 2.1744
0.0214 22.9901 2026 2.2531
0.0183 24.0 2115 2.2672
0.0176 24.9986 2203 2.2650
0.0165 25.9972 2291 2.2785
0.0152 26.9957 2379 2.2726
0.0141 27.9943 2467 2.3100
0.0124 28.9929 2555 2.3323
0.0106 29.9915 2643 2.3571
0.0091 30.9901 2731 2.4116
0.0083 32.0 2820 2.5119
0.0071 32.9986 2908 2.4599
0.0066 33.9972 2996 2.4769
0.0061 34.9957 3084 2.4637
0.0059 35.9943 3172 2.4468
0.0058 36.9929 3260 2.4387
0.0056 37.9915 3348 2.4091
0.0056 38.9901 3436 2.4180
0.0057 40.0 3525 2.4193
0.0057 40.9986 3613 2.4677
0.0058 41.9972 3701 2.3642
0.0058 42.9957 3789 2.4231
0.0059 43.9943 3877 2.4137
0.0057 44.9929 3965 2.4166
0.0056 45.9348 4048 2.3639

Framework versions

  • Transformers 4.46.1
  • Pytorch 2.3.0
  • Datasets 3.1.0
  • Tokenizers 0.20.3
Downloads last month
10
Safetensors
Model size
8.03B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for mlfoundations-dev/oh_scale_x.25_compute_equal

Finetuned
(935)
this model