strickvl commited on
Commit
7fbb55d
·
verified ·
1 Parent(s): a774a42

End of training

Browse files
Files changed (2) hide show
  1. README.md +161 -0
  2. adapter_model.bin +3 -0
README.md ADDED
@@ -0,0 +1,161 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ library_name: peft
4
+ tags:
5
+ - axolotl
6
+ - generated_from_trainer
7
+ base_model: TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T
8
+ model-index:
9
+ - name: isafpr-tiny-llama-lora
10
+ results: []
11
+ ---
12
+
13
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
14
+ should probably proofread and complete it, then remove this comment. -->
15
+
16
+ [<img src="https://raw.githubusercontent.com/OpenAccess-AI-Collective/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/OpenAccess-AI-Collective/axolotl)
17
+ <details><summary>See axolotl config</summary>
18
+
19
+ axolotl version: `0.4.1`
20
+ ```yaml
21
+ base_model: TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T
22
+ model_type: LlamaForCausalLM
23
+ tokenizer_type: LlamaTokenizer
24
+
25
+ load_in_8bit: false
26
+ load_in_4bit: true
27
+ strict: false
28
+
29
+ data_seed: 42
30
+ seed: 42
31
+
32
+ datasets:
33
+ - path: data/isaf_press_releases_ft.jsonl
34
+ conversation: alpaca
35
+ type: sharegpt
36
+ dataset_prepared_path:
37
+ val_set_size: 0.05
38
+ output_dir: ./outputs/tiny-llama/lora-out
39
+ hub_model_id: strickvl/isafpr-tiny-llama-lora
40
+
41
+ sequence_len: 4096
42
+ sample_packing: true
43
+ eval_sample_packing: false
44
+ pad_to_sequence_len: true
45
+
46
+ adapter: lora
47
+ lora_model_dir:
48
+ lora_r: 32
49
+ lora_alpha: 16
50
+ lora_dropout: 0.05
51
+ lora_target_linear: true
52
+ lora_fan_in_fan_out:
53
+
54
+ wandb_project: isaf_pr_ft
55
+ wandb_entity: strickvl
56
+ wandb_watch:
57
+ wandb_name:
58
+ wandb_log_model:
59
+
60
+ gradient_accumulation_steps: 4
61
+ micro_batch_size: 2
62
+ num_epochs: 4
63
+ optimizer: adamw_bnb_8bit
64
+ lr_scheduler: cosine
65
+ learning_rate: 0.0002
66
+
67
+ train_on_inputs: false
68
+ group_by_length: false
69
+ bf16: auto
70
+ fp16:
71
+ tf32: false
72
+
73
+ gradient_checkpointing: true
74
+ early_stopping_patience:
75
+ resume_from_checkpoint:
76
+ local_rank:
77
+ logging_steps: 1
78
+ xformers_attention:
79
+ flash_attention: true
80
+
81
+ warmup_steps: 10
82
+ evals_per_epoch: 4
83
+ saves_per_epoch: 1
84
+ debug:
85
+ deepspeed:
86
+ weight_decay: 0.0
87
+ fsdp:
88
+ fsdp_config:
89
+ special_tokens:
90
+
91
+ ```
92
+
93
+ </details><br>
94
+
95
+ # isafpr-tiny-llama-lora
96
+
97
+ This model is a fine-tuned version of [TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T](https://huggingface.co/TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T) on the None dataset.
98
+ It achieves the following results on the evaluation set:
99
+ - Loss: 0.0212
100
+
101
+ ## Model description
102
+
103
+ More information needed
104
+
105
+ ## Intended uses & limitations
106
+
107
+ More information needed
108
+
109
+ ## Training and evaluation data
110
+
111
+ More information needed
112
+
113
+ ## Training procedure
114
+
115
+ ### Training hyperparameters
116
+
117
+ The following hyperparameters were used during training:
118
+ - learning_rate: 0.0002
119
+ - train_batch_size: 2
120
+ - eval_batch_size: 2
121
+ - seed: 42
122
+ - distributed_type: multi-GPU
123
+ - num_devices: 2
124
+ - gradient_accumulation_steps: 4
125
+ - total_train_batch_size: 16
126
+ - total_eval_batch_size: 4
127
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
128
+ - lr_scheduler_type: cosine
129
+ - lr_scheduler_warmup_steps: 10
130
+ - num_epochs: 4
131
+
132
+ ### Training results
133
+
134
+ | Training Loss | Epoch | Step | Validation Loss |
135
+ |:-------------:|:------:|:----:|:---------------:|
136
+ | 0.8068 | 0.0227 | 1 | 0.8529 |
137
+ | 0.4759 | 0.25 | 11 | 0.4152 |
138
+ | 0.0851 | 0.5 | 22 | 0.0833 |
139
+ | 0.0385 | 0.75 | 33 | 0.0434 |
140
+ | 0.0321 | 1.0 | 44 | 0.0365 |
141
+ | 0.0326 | 1.1705 | 55 | 0.0315 |
142
+ | 0.1114 | 1.4205 | 66 | 0.0283 |
143
+ | 0.0275 | 1.6705 | 77 | 0.0261 |
144
+ | 0.0282 | 1.9205 | 88 | 0.0246 |
145
+ | 0.0206 | 2.0909 | 99 | 0.0237 |
146
+ | 0.0675 | 2.3409 | 110 | 0.0228 |
147
+ | 0.0201 | 2.5909 | 121 | 0.0222 |
148
+ | 0.0176 | 2.8409 | 132 | 0.0218 |
149
+ | 0.0941 | 3.0114 | 143 | 0.0214 |
150
+ | 0.0262 | 3.2614 | 154 | 0.0213 |
151
+ | 0.051 | 3.5114 | 165 | 0.0213 |
152
+ | 0.0184 | 3.7614 | 176 | 0.0212 |
153
+
154
+
155
+ ### Framework versions
156
+
157
+ - PEFT 0.11.1
158
+ - Transformers 4.41.1
159
+ - Pytorch 2.3.0+cu121
160
+ - Datasets 2.19.1
161
+ - Tokenizers 0.19.1
adapter_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f999cad7e14f1e3ae89430a9cad5ba5d51d1d035a409ac2f6b22a690a6ef6baa
3
+ size 101036698