--- library_name: transformers license: llama3.1 base_model: meta-llama/Meta-Llama-3.1-8B tags: - llama-factory - full - generated_from_trainer model-index: - name: oh_scale_x.25_compute_equal results: [] --- # oh_scale_x.25_compute_equal This model is a fine-tuned version of [meta-llama/Meta-Llama-3.1-8B](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B) on the mlfoundations-dev/oh-dcft-v1.3_no-curation_gpt-4o-mini_scale_0.25x dataset. It achieves the following results on the evaluation set: - Loss: 2.3639 ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 5e-06 - train_batch_size: 8 - eval_batch_size: 8 - seed: 42 - distributed_type: multi-GPU - num_devices: 8 - gradient_accumulation_steps: 8 - total_train_batch_size: 512 - total_eval_batch_size: 64 - optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments - lr_scheduler_type: constant - num_epochs: 46.0 ### Training results | Training Loss | Epoch | Step | Validation Loss | |:-------------:|:-------:|:----:|:---------------:| | 0.8194 | 0.9986 | 88 | 0.8081 | | 0.7435 | 1.9972 | 176 | 0.7863 | | 0.6821 | 2.9957 | 264 | 0.7852 | | 0.6325 | 3.9943 | 352 | 0.7986 | | 0.5795 | 4.9929 | 440 | 0.8202 | | 0.5193 | 5.9915 | 528 | 0.8596 | | 0.4751 | 6.9901 | 616 | 0.9139 | | 0.4221 | 8.0 | 705 | 1.0006 | | 0.3649 | 8.9986 | 793 | 1.0596 | | 0.3192 | 9.9972 | 881 | 1.1392 | | 0.2658 | 10.9957 | 969 | 1.2517 | | 0.2232 | 11.9943 | 1057 | 1.3438 | | 0.1817 | 12.9929 | 1145 | 1.4416 | | 0.1418 | 13.9915 | 1233 | 1.5400 | | 0.1144 | 14.9901 | 1321 | 1.6749 | | 0.0932 | 16.0 | 1410 | 1.7733 | | 0.0669 | 16.9986 | 1498 | 1.9060 | | 0.0506 | 17.9972 | 1586 | 1.9451 | | 0.0412 | 18.9957 | 1674 | 2.0182 | | 0.0336 | 19.9943 | 1762 | 2.0949 | | 0.0299 | 20.9929 | 1850 | 2.1437 | | 0.0255 | 21.9915 | 1938 | 2.1744 | | 0.0214 | 22.9901 | 2026 | 2.2531 | | 0.0183 | 24.0 | 2115 | 2.2672 | | 0.0176 | 24.9986 | 2203 | 2.2650 | | 0.0165 | 25.9972 | 2291 | 2.2785 | | 0.0152 | 26.9957 | 2379 | 2.2726 | | 0.0141 | 27.9943 | 2467 | 2.3100 | | 0.0124 | 28.9929 | 2555 | 2.3323 | | 0.0106 | 29.9915 | 2643 | 2.3571 | | 0.0091 | 30.9901 | 2731 | 2.4116 | | 0.0083 | 32.0 | 2820 | 2.5119 | | 0.0071 | 32.9986 | 2908 | 2.4599 | | 0.0066 | 33.9972 | 2996 | 2.4769 | | 0.0061 | 34.9957 | 3084 | 2.4637 | | 0.0059 | 35.9943 | 3172 | 2.4468 | | 0.0058 | 36.9929 | 3260 | 2.4387 | | 0.0056 | 37.9915 | 3348 | 2.4091 | | 0.0056 | 38.9901 | 3436 | 2.4180 | | 0.0057 | 40.0 | 3525 | 2.4193 | | 0.0057 | 40.9986 | 3613 | 2.4677 | | 0.0058 | 41.9972 | 3701 | 2.3642 | | 0.0058 | 42.9957 | 3789 | 2.4231 | | 0.0059 | 43.9943 | 3877 | 2.4137 | | 0.0057 | 44.9929 | 3965 | 2.4166 | | 0.0056 | 45.9348 | 4048 | 2.3639 | ### Framework versions - Transformers 4.46.1 - Pytorch 2.3.0 - Datasets 3.1.0 - Tokenizers 0.20.3