secemp9
/

TraceBack-12b

@@ -23,11 +23,154 @@ It has many goals in mind, but mainly:
 - converting any non-reasoning model output/datasets to a reasoning synthetic dataset when used as input
 So far, current proof of concept managed to check the boxes for 1 and 3, and I plan on scaling this more as:
-- this only use Mistral nemo 12b as base
-- Was only trained for 2 epoch
-- Only 200k samples were used for finetuning (Qlora)
 So there are still much room for improvement
 This was trained using both instruction and solution as input, and the output being a plausible/possible/matching reasoning trace based on that.
-I believe this is the future of reasoning data generation. Stay tuned for an eval release

 - converting any non-reasoning model output/datasets to a reasoning synthetic dataset when used as input
 So far, current proof of concept managed to check the boxes for 1 and 3, and I plan on scaling this more as:
+- this only use Mistral Nemo 12b as base
+- Was only trained for 2 epochs
+- Only 200k samples were used for finetuning (Qlora), dataset at https://huggingface.co/datasets/secemp9/instruction_solution_thought
 So there are still much room for improvement
 This was trained using both instruction and solution as input, and the output being a plausible/possible/matching reasoning trace based on that.
+I believe this is the future of reasoning data generation. Stay tuned for an eval release
+Here some inference example, using chatgpt instruction + solution as input:
+# Inference Example
+Here I use a simple example from chatgpt, passing both the instruction and the solution as input to the model:
+![image/png](https://cdn-uploads.huggingface.co/production/uploads/65986192b0c5357368bacbf8/rtuYmWGw8lk09AQi_dpX8.png)
+# Dataset Example
+Here the format for the dataset follow instruction + solution: reasoning trace pairs
+Sample conversation:
+```
+{
+  "messages": [
+    {
+      "role": "user",
+      "content": "Instruction:
+      text_here
+      Solution:
+      text_here
+    },
+    {
+      "role": "assistant",
+      "content": "text_here"
+    }
+  ]
+}
+```
+which look like:
+![image/png](https://cdn-uploads.huggingface.co/production/uploads/65986192b0c5357368bacbf8/GdbZxeLSDsJmZDHJ8SN-g.png)
+# Prompt Format
+For the prompt format, I was really trying to not overengineer, but I'm sure there is a better way to format this.
+For now it's just:
+Instruction:
+Solution:
+the output of the model doesn't have (for now) any formatting, it's just reasoning as output
+# Axolotl config
+For this, I basically tried to convert my unsloth code to an axolotl config file. I also used deepspeed. Configuration below:
+config.yml
+```
+# Base model configuration
+base_model: unsloth/Mistral-Nemo-Instruct-2407-bnb-4bit
+load_in_4bit: true
+# Dataset configuration
+datasets:
+  - path: instruction_solution_to_thought_dataset.jsonl
+    type: chat_template
+# Chat template
+chat_template: chatml
+# LoRA adapter configuration
+adapter: lora
+lora_r: 16
+lora_alpha: 16
+lora_dropout: 0
+lora_target_modules:
+  - q_proj
+  - k_proj
+  - v_proj
+  - o_proj
+  - gate_proj
+  - up_proj
+  - down_proj
+# Training hyperparameters
+max_seq_length: 128000
+micro_batch_size: 2
+gradient_accumulation_steps: 8
+learning_rate: 3e-5
+num_epochs: 3
+warmup_steps: 100
+optimizer: adamw_8bit
+weight_decay: 0.01
+lr_scheduler_type: cosine
+max_grad_norm: 1.0
+output_dir: ./outputs_solution_to_thought
+seed: 3407
+merge_lora: true
+hf_upload: true
+hf_repo: secemp9/TraceBack-12b
+xformers_attention:
+flash_attention: True
+bf16: true          # Enable BF16 mixed precision
+# Multi-GPU training with DeepSpeed
+deepspeed: deepspeed_configs/zero2.json
+# Optional: Enable gradient checkpointing
+gradient_checkpointing: true
+```
+deepspeed_configs/zero2.json
+```
+{
+  "zero_optimization": {
+    "stage": 2,
+    "allgather_partitions": true,
+    "allgather_bucket_size": 2e8,
+    "overlap_comm": true,
+    "reduce_scatter": true,
+    "reduce_bucket_size": 2e8,
+    "contiguous_gradients": true
+  },
+  "bf16": {
+    "enabled": true
+  },
+  "optimizer": {
+    "type": "AdamW",
+    "params": {
+      "lr": "auto",
+      "weight_decay": "auto",
+      "betas": [0.9, 0.999],
+      "eps": 1e-8
+    }
+  },
+  "scheduler": {
+    "type": "WarmupLR",
+    "params": {
+      "warmup_min_lr": 0,
+      "warmup_max_lr": "auto",
+      "warmup_num_steps": "auto"
+    }
+  },
+  "train_micro_batch_size_per_gpu": "auto",
+  "gradient_accumulation_steps": "auto",
+  "steps_per_print": 10,
+  "wandb": {
+    "enabled": true
+  }
+}
+```