secemp9 commited on
Commit
4c7e018
·
verified ·
1 Parent(s): 62388d1

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +147 -4
README.md CHANGED
@@ -23,11 +23,154 @@ It has many goals in mind, but mainly:
23
  - converting any non-reasoning model output/datasets to a reasoning synthetic dataset when used as input
24
 
25
  So far, current proof of concept managed to check the boxes for 1 and 3, and I plan on scaling this more as:
26
- - this only use Mistral nemo 12b as base
27
- - Was only trained for 2 epoch
28
- - Only 200k samples were used for finetuning (Qlora)
 
29
  So there are still much room for improvement
30
 
31
  This was trained using both instruction and solution as input, and the output being a plausible/possible/matching reasoning trace based on that.
32
 
33
- I believe this is the future of reasoning data generation. Stay tuned for an eval release
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
23
  - converting any non-reasoning model output/datasets to a reasoning synthetic dataset when used as input
24
 
25
  So far, current proof of concept managed to check the boxes for 1 and 3, and I plan on scaling this more as:
26
+ - this only use Mistral Nemo 12b as base
27
+ - Was only trained for 2 epochs
28
+ - Only 200k samples were used for finetuning (Qlora), dataset at https://huggingface.co/datasets/secemp9/instruction_solution_thought
29
+
30
  So there are still much room for improvement
31
 
32
  This was trained using both instruction and solution as input, and the output being a plausible/possible/matching reasoning trace based on that.
33
 
34
+ I believe this is the future of reasoning data generation. Stay tuned for an eval release
35
+
36
+ Here some inference example, using chatgpt instruction + solution as input:
37
+
38
+ # Inference Example
39
+ Here I use a simple example from chatgpt, passing both the instruction and the solution as input to the model:
40
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/65986192b0c5357368bacbf8/rtuYmWGw8lk09AQi_dpX8.png)
41
+
42
+ # Dataset Example
43
+
44
+ Here the format for the dataset follow instruction + solution: reasoning trace pairs
45
+ Sample conversation:
46
+ ```
47
+ {
48
+ "messages": [
49
+ {
50
+ "role": "user",
51
+ "content": "Instruction:
52
+ text_here
53
+
54
+ Solution:
55
+ text_here
56
+ },
57
+ {
58
+ "role": "assistant",
59
+ "content": "text_here"
60
+ }
61
+ ]
62
+ }
63
+ ```
64
+ which look like:
65
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/65986192b0c5357368bacbf8/GdbZxeLSDsJmZDHJ8SN-g.png)
66
+
67
+ # Prompt Format
68
+
69
+ For the prompt format, I was really trying to not overengineer, but I'm sure there is a better way to format this.
70
+
71
+ For now it's just:
72
+ Instruction:
73
+
74
+ Solution:
75
+
76
+ the output of the model doesn't have (for now) any formatting, it's just reasoning as output
77
+
78
+ # Axolotl config
79
+
80
+ For this, I basically tried to convert my unsloth code to an axolotl config file. I also used deepspeed. Configuration below:
81
+
82
+ config.yml
83
+ ```
84
+ # Base model configuration
85
+ base_model: unsloth/Mistral-Nemo-Instruct-2407-bnb-4bit
86
+ load_in_4bit: true
87
+
88
+ # Dataset configuration
89
+ datasets:
90
+ - path: instruction_solution_to_thought_dataset.jsonl
91
+ type: chat_template
92
+
93
+ # Chat template
94
+ chat_template: chatml
95
+
96
+ # LoRA adapter configuration
97
+ adapter: lora
98
+ lora_r: 16
99
+ lora_alpha: 16
100
+ lora_dropout: 0
101
+ lora_target_modules:
102
+ - q_proj
103
+ - k_proj
104
+ - v_proj
105
+ - o_proj
106
+ - gate_proj
107
+ - up_proj
108
+ - down_proj
109
+
110
+ # Training hyperparameters
111
+ max_seq_length: 128000
112
+ micro_batch_size: 2
113
+ gradient_accumulation_steps: 8
114
+ learning_rate: 3e-5
115
+ num_epochs: 3
116
+ warmup_steps: 100
117
+ optimizer: adamw_8bit
118
+ weight_decay: 0.01
119
+ lr_scheduler_type: cosine
120
+ max_grad_norm: 1.0
121
+ output_dir: ./outputs_solution_to_thought
122
+ seed: 3407
123
+ merge_lora: true
124
+ hf_upload: true
125
+ hf_repo: secemp9/TraceBack-12b
126
+ xformers_attention:
127
+ flash_attention: True
128
+ bf16: true # Enable BF16 mixed precision
129
+ # Multi-GPU training with DeepSpeed
130
+ deepspeed: deepspeed_configs/zero2.json
131
+
132
+ # Optional: Enable gradient checkpointing
133
+ gradient_checkpointing: true
134
+ ```
135
+
136
+ deepspeed_configs/zero2.json
137
+ ```
138
+ {
139
+ "zero_optimization": {
140
+ "stage": 2,
141
+ "allgather_partitions": true,
142
+ "allgather_bucket_size": 2e8,
143
+ "overlap_comm": true,
144
+ "reduce_scatter": true,
145
+ "reduce_bucket_size": 2e8,
146
+ "contiguous_gradients": true
147
+ },
148
+ "bf16": {
149
+ "enabled": true
150
+ },
151
+ "optimizer": {
152
+ "type": "AdamW",
153
+ "params": {
154
+ "lr": "auto",
155
+ "weight_decay": "auto",
156
+ "betas": [0.9, 0.999],
157
+ "eps": 1e-8
158
+ }
159
+ },
160
+ "scheduler": {
161
+ "type": "WarmupLR",
162
+ "params": {
163
+ "warmup_min_lr": 0,
164
+ "warmup_max_lr": "auto",
165
+ "warmup_num_steps": "auto"
166
+ }
167
+ },
168
+ "train_micro_batch_size_per_gpu": "auto",
169
+ "gradient_accumulation_steps": "auto",
170
+ "steps_per_print": 10,
171
+ "wandb": {
172
+ "enabled": true
173
+ }
174
+ }
175
+ ```
176
+