metadata
license: other
library_name: peft
tags:
- llama-factory
- lora
- generated_from_trainer
base_model: /data1/model/llama2/meta-llama/Llama2-13b
model-index:
- name: elementary_math_qa_no_sys
results: []
elementary_math_qa_no_sys
This model is a fine-tuned version of /data1/model/llama2/meta-llama/Llama2-13b on the elementary_math_qa_no_sys dataset. It achieves the following results on the evaluation set:
- Loss: 0.0705
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0001
- train_batch_size: 8
- eval_batch_size: 8
- seed: 42
- distributed_type: multi-GPU
- num_devices: 3
- total_train_batch_size: 24
- total_eval_batch_size: 24
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 20
- num_epochs: 10.0
Training results
Training Loss | Epoch | Step | Validation Loss |
---|---|---|---|
0.3438 | 0.05 | 50 | 0.3364 |
0.3021 | 0.09 | 100 | 0.3056 |
0.2676 | 0.14 | 150 | 0.2710 |
0.2679 | 0.18 | 200 | 0.2582 |
0.2448 | 0.23 | 250 | 0.2475 |
0.2355 | 0.28 | 300 | 0.2372 |
0.2335 | 0.32 | 350 | 0.2282 |
0.2226 | 0.37 | 400 | 0.2227 |
0.2098 | 0.42 | 450 | 0.2109 |
0.1839 | 0.46 | 500 | 0.2048 |
0.2008 | 0.51 | 550 | 0.1992 |
0.1945 | 0.55 | 600 | 0.2019 |
0.1891 | 0.6 | 650 | 0.1859 |
0.2015 | 0.65 | 700 | 0.1966 |
0.174 | 0.69 | 750 | 0.1801 |
0.1565 | 0.74 | 800 | 0.1762 |
0.1825 | 0.79 | 850 | 0.1717 |
0.1651 | 0.83 | 900 | 0.1682 |
0.1598 | 0.88 | 950 | 0.1598 |
0.1502 | 0.92 | 1000 | 0.1558 |
0.1599 | 0.97 | 1050 | 0.1465 |
0.0977 | 1.02 | 1100 | 0.1520 |
0.1166 | 1.06 | 1150 | 0.1403 |
0.0943 | 1.11 | 1200 | 0.1387 |
0.1007 | 1.16 | 1250 | 0.1311 |
0.1035 | 1.2 | 1300 | 0.1325 |
0.0842 | 1.25 | 1350 | 0.1309 |
0.1114 | 1.29 | 1400 | 0.1225 |
0.1047 | 1.34 | 1450 | 0.1184 |
0.0807 | 1.39 | 1500 | 0.1136 |
0.0846 | 1.43 | 1550 | 0.1200 |
0.0737 | 1.48 | 1600 | 0.1145 |
0.0844 | 1.52 | 1650 | 0.1037 |
0.0809 | 1.57 | 1700 | 0.0940 |
0.0718 | 1.62 | 1750 | 0.0931 |
0.0687 | 1.66 | 1800 | 0.0930 |
0.0629 | 1.71 | 1850 | 0.0969 |
0.0852 | 1.76 | 1900 | 0.0872 |
0.0622 | 1.8 | 1950 | 0.0849 |
0.0653 | 1.85 | 2000 | 0.0831 |
0.0507 | 1.89 | 2050 | 0.0829 |
0.0518 | 1.94 | 2100 | 0.0785 |
0.0566 | 1.99 | 2150 | 0.0750 |
0.0193 | 2.03 | 2200 | 0.0837 |
0.0233 | 2.08 | 2250 | 0.0766 |
0.0249 | 2.13 | 2300 | 0.0829 |
0.0217 | 2.17 | 2350 | 0.0824 |
0.0233 | 2.22 | 2400 | 0.0735 |
0.0192 | 2.26 | 2450 | 0.0767 |
0.0207 | 2.31 | 2500 | 0.0794 |
0.0232 | 2.36 | 2550 | 0.0843 |
0.0295 | 2.4 | 2600 | 0.0800 |
0.0185 | 2.45 | 2650 | 0.0777 |
0.0178 | 2.5 | 2700 | 0.0767 |
0.0245 | 2.54 | 2750 | 0.0717 |
0.0226 | 2.59 | 2800 | 0.0774 |
0.0222 | 2.63 | 2850 | 0.0671 |
0.0194 | 2.68 | 2900 | 0.0666 |
0.0162 | 2.73 | 2950 | 0.0713 |
0.0184 | 2.77 | 3000 | 0.0740 |
0.0227 | 2.82 | 3050 | 0.0675 |
0.0176 | 2.87 | 3100 | 0.0701 |
0.034 | 2.91 | 3150 | 0.0675 |
0.0148 | 2.96 | 3200 | 0.0688 |
0.014 | 3.0 | 3250 | 0.0673 |
0.0178 | 3.05 | 3300 | 0.0719 |
0.0059 | 3.1 | 3350 | 0.0734 |
0.0069 | 3.14 | 3400 | 0.0764 |
0.0074 | 3.19 | 3450 | 0.0818 |
0.009 | 3.23 | 3500 | 0.0705 |
0.0048 | 3.28 | 3550 | 0.0735 |
0.005 | 3.33 | 3600 | 0.0705 |
0.0073 | 3.37 | 3650 | 0.0724 |
Framework versions
- PEFT 0.9.0
- Transformers 4.38.2
- Pytorch 2.2.1
- Datasets 2.18.0
- Tokenizers 0.15.2