oh_scale_x.25_compute_equal
This model is a fine-tuned version of meta-llama/Meta-Llama-3.1-8B on the mlfoundations-dev/oh-dcft-v1.3_no-curation_gpt-4o-mini_scale_0.25x dataset. It achieves the following results on the evaluation set:
- Loss: 2.3639
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-06
- train_batch_size: 8
- eval_batch_size: 8
- seed: 42
- distributed_type: multi-GPU
- num_devices: 8
- gradient_accumulation_steps: 8
- total_train_batch_size: 512
- total_eval_batch_size: 64
- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: constant
- num_epochs: 46.0
Training results
Training Loss | Epoch | Step | Validation Loss |
---|---|---|---|
0.8194 | 0.9986 | 88 | 0.8081 |
0.7435 | 1.9972 | 176 | 0.7863 |
0.6821 | 2.9957 | 264 | 0.7852 |
0.6325 | 3.9943 | 352 | 0.7986 |
0.5795 | 4.9929 | 440 | 0.8202 |
0.5193 | 5.9915 | 528 | 0.8596 |
0.4751 | 6.9901 | 616 | 0.9139 |
0.4221 | 8.0 | 705 | 1.0006 |
0.3649 | 8.9986 | 793 | 1.0596 |
0.3192 | 9.9972 | 881 | 1.1392 |
0.2658 | 10.9957 | 969 | 1.2517 |
0.2232 | 11.9943 | 1057 | 1.3438 |
0.1817 | 12.9929 | 1145 | 1.4416 |
0.1418 | 13.9915 | 1233 | 1.5400 |
0.1144 | 14.9901 | 1321 | 1.6749 |
0.0932 | 16.0 | 1410 | 1.7733 |
0.0669 | 16.9986 | 1498 | 1.9060 |
0.0506 | 17.9972 | 1586 | 1.9451 |
0.0412 | 18.9957 | 1674 | 2.0182 |
0.0336 | 19.9943 | 1762 | 2.0949 |
0.0299 | 20.9929 | 1850 | 2.1437 |
0.0255 | 21.9915 | 1938 | 2.1744 |
0.0214 | 22.9901 | 2026 | 2.2531 |
0.0183 | 24.0 | 2115 | 2.2672 |
0.0176 | 24.9986 | 2203 | 2.2650 |
0.0165 | 25.9972 | 2291 | 2.2785 |
0.0152 | 26.9957 | 2379 | 2.2726 |
0.0141 | 27.9943 | 2467 | 2.3100 |
0.0124 | 28.9929 | 2555 | 2.3323 |
0.0106 | 29.9915 | 2643 | 2.3571 |
0.0091 | 30.9901 | 2731 | 2.4116 |
0.0083 | 32.0 | 2820 | 2.5119 |
0.0071 | 32.9986 | 2908 | 2.4599 |
0.0066 | 33.9972 | 2996 | 2.4769 |
0.0061 | 34.9957 | 3084 | 2.4637 |
0.0059 | 35.9943 | 3172 | 2.4468 |
0.0058 | 36.9929 | 3260 | 2.4387 |
0.0056 | 37.9915 | 3348 | 2.4091 |
0.0056 | 38.9901 | 3436 | 2.4180 |
0.0057 | 40.0 | 3525 | 2.4193 |
0.0057 | 40.9986 | 3613 | 2.4677 |
0.0058 | 41.9972 | 3701 | 2.3642 |
0.0058 | 42.9957 | 3789 | 2.4231 |
0.0059 | 43.9943 | 3877 | 2.4137 |
0.0057 | 44.9929 | 3965 | 2.4166 |
0.0056 | 45.9348 | 4048 | 2.3639 |
Framework versions
- Transformers 4.46.1
- Pytorch 2.3.0
- Datasets 3.1.0
- Tokenizers 0.20.3
- Downloads last month
- 10
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support
Model tree for mlfoundations-dev/oh_scale_x.25_compute_equal
Base model
meta-llama/Llama-3.1-8B