File size: 6,333 Bytes

---
library_name: transformers
license: llama3.1
base_model: meta-llama/Meta-Llama-3.1-8B
tags:
- llama-factory
- full
- generated_from_trainer
model-index:
- name: oh_scale_x.125_compute_equal
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# oh_scale_x.125_compute_equal

This model is a fine-tuned version of [meta-llama/Meta-Llama-3.1-8B](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B) on the mlfoundations-dev/oh-dcft-v1.3_no-curation_gpt-4o-mini_scale_0.125x dataset.
It achieves the following results on the evaluation set:
- Loss: 2.0839

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 5e-06
- train_batch_size: 8
- eval_batch_size: 8
- seed: 42
- distributed_type: multi-GPU
- num_devices: 8
- gradient_accumulation_steps: 8
- total_train_batch_size: 512
- total_eval_batch_size: 64
- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: constant
- num_epochs: 89.0

### Training results

| Training Loss | Epoch   | Step | Validation Loss |
|:-------------:|:-------:|:----:|:---------------:|
| 0.8588        | 0.9973  | 47   | 0.8431          |
| 0.7685        | 1.9947  | 94   | 0.8078          |
| 0.7039        | 2.9920  | 141  | 0.8061          |
| 0.6431        | 3.9894  | 188  | 0.8146          |
| 0.6047        | 4.9867  | 235  | 0.8365          |
| 0.5574        | 5.9841  | 282  | 0.8701          |
| 0.5092        | 6.9814  | 329  | 0.8984          |
| 0.4572        | 8.0     | 377  | 0.9556          |
| 0.4085        | 8.9973  | 424  | 1.0193          |
| 0.349         | 9.9947  | 471  | 1.1014          |
| 0.2917        | 10.9920 | 518  | 1.1841          |
| 0.2371        | 11.9894 | 565  | 1.2766          |
| 0.1947        | 12.9867 | 612  | 1.4154          |
| 0.1574        | 13.9841 | 659  | 1.5165          |
| 0.1248        | 14.9814 | 706  | 1.6125          |
| 0.0949        | 16.0    | 754  | 1.7871          |
| 0.072         | 16.9973 | 801  | 1.8431          |
| 0.0557        | 17.9947 | 848  | 1.8931          |
| 0.0476        | 18.9920 | 895  | 1.8831          |
| 0.0389        | 19.9894 | 942  | 2.0265          |
| 0.0326        | 20.9867 | 989  | 2.0191          |
| 0.0289        | 21.9841 | 1036 | 2.0776          |
| 0.0241        | 22.9814 | 1083 | 2.1365          |
| 0.0224        | 24.0    | 1131 | 2.1633          |
| 0.0186        | 24.9973 | 1178 | 2.1493          |
| 0.0168        | 25.9947 | 1225 | 2.1881          |
| 0.0165        | 26.9920 | 1272 | 2.2118          |
| 0.0149        | 27.9894 | 1319 | 2.1890          |
| 0.0138        | 28.9867 | 1366 | 2.2228          |
| 0.0124        | 29.9841 | 1413 | 2.2381          |
| 0.0099        | 30.9814 | 1460 | 2.2632          |
| 0.0082        | 32.0    | 1508 | 2.3145          |
| 0.0074        | 32.9973 | 1555 | 2.3310          |
| 0.0063        | 33.9947 | 1602 | 2.2894          |
| 0.0058        | 34.9920 | 1649 | 2.3082          |
| 0.0051        | 35.9894 | 1696 | 2.3288          |
| 0.0048        | 36.9867 | 1743 | 2.3887          |
| 0.0047        | 37.9841 | 1790 | 2.3353          |
| 0.0046        | 38.9814 | 1837 | 2.3314          |
| 0.0046        | 40.0    | 1885 | 2.3529          |
| 0.0046        | 40.9973 | 1932 | 2.2960          |
| 0.0044        | 41.9947 | 1979 | 2.2470          |
| 0.0046        | 42.9920 | 2026 | 2.2445          |
| 0.0047        | 43.9894 | 2073 | 2.1857          |
| 0.0046        | 44.9867 | 2120 | 2.2821          |
| 0.0044        | 45.9841 | 2167 | 2.1947          |
| 0.0046        | 46.9814 | 2214 | 2.2448          |
| 0.0046        | 48.0    | 2262 | 2.2752          |
| 0.0045        | 48.9973 | 2309 | 2.1920          |
| 0.0043        | 49.9947 | 2356 | 2.2769          |
| 0.0046        | 50.9920 | 2403 | 2.1450          |
| 0.0047        | 51.9894 | 2450 | 2.1438          |
| 0.0045        | 52.9867 | 2497 | 2.2089          |
| 0.0046        | 53.9841 | 2544 | 2.1234          |
| 0.0043        | 54.9814 | 2591 | 2.0988          |
| 0.0042        | 56.0    | 2639 | 2.2262          |
| 0.0041        | 56.9973 | 2686 | 2.1830          |
| 0.0043        | 57.9947 | 2733 | 2.0565          |
| 0.0044        | 58.9920 | 2780 | 2.1350          |
| 0.0042        | 59.9894 | 2827 | 2.1475          |
| 0.004         | 60.9867 | 2874 | 2.1590          |
| 0.0039        | 61.9841 | 2921 | 2.1752          |
| 0.0043        | 62.9814 | 2968 | 2.0756          |
| 0.0038        | 64.0    | 3016 | 2.1629          |
| 0.0038        | 64.9973 | 3063 | 2.1522          |
| 0.0036        | 65.9947 | 3110 | 2.1449          |
| 0.0035        | 66.9920 | 3157 | 2.1889          |
| 0.0035        | 67.9894 | 3204 | 2.0248          |
| 0.0034        | 68.9867 | 3251 | 2.1538          |
| 0.0034        | 69.9841 | 3298 | 2.1202          |
| 0.0035        | 70.9814 | 3345 | 2.0326          |
| 0.0035        | 72.0    | 3393 | 2.1360          |
| 0.0036        | 72.9973 | 3440 | 2.1404          |
| 0.0036        | 73.9947 | 3487 | 2.0651          |
| 0.0035        | 74.9920 | 3534 | 2.0982          |
| 0.0033        | 75.9894 | 3581 | 2.1032          |
| 0.0034        | 76.9867 | 3628 | 2.1028          |
| 0.0032        | 77.9841 | 3675 | 2.1282          |
| 0.0031        | 78.9814 | 3722 | 2.0912          |
| 0.0035        | 80.0    | 3770 | 2.0766          |
| 0.0033        | 80.9973 | 3817 | 2.0286          |
| 0.0033        | 81.9947 | 3864 | 2.0421          |
| 0.0034        | 82.9920 | 3911 | 2.1121          |
| 0.0033        | 83.9894 | 3958 | 2.0832          |
| 0.0033        | 84.9867 | 4005 | 2.0629          |
| 0.0034        | 85.9841 | 4052 | 2.1398          |
| 0.0032        | 86.9814 | 4099 | 2.1203          |
| 0.0032        | 88.0    | 4147 | 2.1025          |
| 0.0035        | 88.7639 | 4183 | 2.0839          |


### Framework versions

- Transformers 4.46.1
- Pytorch 2.3.0
- Datasets 3.1.0
- Tokenizers 0.20.3