SystemAdmin123 commited on
Commit
e7d262f
·
verified ·
1 Parent(s): 7baed21

End of training

Browse files
Files changed (1) hide show
  1. README.md +15 -27
README.md CHANGED
@@ -20,7 +20,7 @@ should probably proofread and complete it, then remove this comment. -->
20
  axolotl version: `0.6.0`
21
  ```yaml
22
  base_model: trl-internal-testing/tiny-random-LlamaForCausalLM
23
- batch_size: 64
24
  bf16: true
25
  chat_template: tokenizer_default_fallback_alpaca
26
  datasets:
@@ -36,7 +36,7 @@ datasets:
36
  system_prompt: ''
37
  device_map: auto
38
  eval_sample_packing: false
39
- eval_steps: 50
40
  flash_attention: true
41
  gradient_checkpointing: true
42
  group_by_length: true
@@ -45,7 +45,7 @@ hub_strategy: checkpoint
45
  learning_rate: 0.0002
46
  logging_steps: 10
47
  lr_scheduler: cosine
48
- max_steps: 5000
49
  micro_batch_size: 32
50
  model_type: AutoModelForCausalLM
51
  num_epochs: 100
@@ -54,11 +54,13 @@ output_dir: /root/.sn56/axolotl/tmp/tiny-random-LlamaForCausalLM
54
  pad_to_sequence_len: true
55
  resize_token_embeddings_to_32x: false
56
  sample_packing: true
57
- save_steps: 50
58
- save_total_limit: 2
59
  sequence_len: 2048
60
  tokenizer_type: LlamaTokenizerFast
61
  torch_dtype: bf16
 
 
62
  trust_remote_code: true
63
  val_set_size: 0.1
64
  wandb_entity: ''
@@ -76,8 +78,6 @@ warmup_ratio: 0.05
76
  # tiny-random-LlamaForCausalLM
77
 
78
  This model is a fine-tuned version of [trl-internal-testing/tiny-random-LlamaForCausalLM](https://huggingface.co/trl-internal-testing/tiny-random-LlamaForCausalLM) on the argilla/databricks-dolly-15k-curated-en dataset.
79
- It achieves the following results on the evaluation set:
80
- - Loss: 9.1943
81
 
82
  ## Model description
83
 
@@ -101,31 +101,19 @@ The following hyperparameters were used during training:
101
  - eval_batch_size: 32
102
  - seed: 42
103
  - distributed_type: multi-GPU
104
- - num_devices: 2
105
- - total_train_batch_size: 64
106
- - total_eval_batch_size: 64
107
  - optimizer: Use OptimizerNames.ADAMW_BNB with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
108
  - lr_scheduler_type: cosine
109
- - lr_scheduler_warmup_steps: 30
110
- - training_steps: 600
111
 
112
  ### Training results
113
 
114
- | Training Loss | Epoch | Step | Validation Loss |
115
- |:-------------:|:-------:|:----:|:---------------:|
116
- | No log | 0.0769 | 1 | 10.3764 |
117
- | 10.3159 | 3.8462 | 50 | 10.2852 |
118
- | 9.998 | 7.6923 | 100 | 9.9738 |
119
- | 9.7359 | 11.5385 | 150 | 9.7190 |
120
- | 9.5151 | 15.3846 | 200 | 9.5042 |
121
- | 9.3407 | 19.2308 | 250 | 9.3411 |
122
- | 9.2338 | 23.0769 | 300 | 9.2415 |
123
- | 9.1896 | 26.9231 | 350 | 9.2039 |
124
- | 9.18 | 30.7692 | 400 | 9.1960 |
125
- | 9.1777 | 34.6154 | 450 | 9.1957 |
126
- | 9.1781 | 38.4615 | 500 | 9.1931 |
127
- | 9.1761 | 42.3077 | 550 | 9.1936 |
128
- | 9.1762 | 46.1538 | 600 | 9.1943 |
129
 
130
 
131
  ### Framework versions
 
20
  axolotl version: `0.6.0`
21
  ```yaml
22
  base_model: trl-internal-testing/tiny-random-LlamaForCausalLM
23
+ batch_size: 128
24
  bf16: true
25
  chat_template: tokenizer_default_fallback_alpaca
26
  datasets:
 
36
  system_prompt: ''
37
  device_map: auto
38
  eval_sample_packing: false
39
+ eval_steps: 200
40
  flash_attention: true
41
  gradient_checkpointing: true
42
  group_by_length: true
 
45
  learning_rate: 0.0002
46
  logging_steps: 10
47
  lr_scheduler: cosine
48
+ max_steps: 10000
49
  micro_batch_size: 32
50
  model_type: AutoModelForCausalLM
51
  num_epochs: 100
 
54
  pad_to_sequence_len: true
55
  resize_token_embeddings_to_32x: false
56
  sample_packing: true
57
+ save_steps: 200
58
+ save_total_limit: 1
59
  sequence_len: 2048
60
  tokenizer_type: LlamaTokenizerFast
61
  torch_dtype: bf16
62
+ training_args_kwargs:
63
+ hub_private_repo: true
64
  trust_remote_code: true
65
  val_set_size: 0.1
66
  wandb_entity: ''
 
78
  # tiny-random-LlamaForCausalLM
79
 
80
  This model is a fine-tuned version of [trl-internal-testing/tiny-random-LlamaForCausalLM](https://huggingface.co/trl-internal-testing/tiny-random-LlamaForCausalLM) on the argilla/databricks-dolly-15k-curated-en dataset.
 
 
81
 
82
  ## Model description
83
 
 
101
  - eval_batch_size: 32
102
  - seed: 42
103
  - distributed_type: multi-GPU
104
+ - num_devices: 4
105
+ - total_train_batch_size: 128
106
+ - total_eval_batch_size: 128
107
  - optimizer: Use OptimizerNames.ADAMW_BNB with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
108
  - lr_scheduler_type: cosine
109
+ - lr_scheduler_warmup_steps: 5
110
+ - training_steps: 100
111
 
112
  ### Training results
113
 
114
+ | Training Loss | Epoch | Step | Validation Loss |
115
+ |:-------------:|:------:|:----:|:---------------:|
116
+ | No log | 0.1667 | 1 | 10.3764 |
 
 
 
 
 
 
 
 
 
 
 
 
117
 
118
 
119
  ### Framework versions