secemp9
/

TraceBack-12b

@@ -11,117 +11,23 @@ model-index:
   results: []
 ---
-<!-- This model card has been generated automatically according to the information the Trainer had access to. You
-should probably proofread and complete it, then remove this comment. -->
-[<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl)
-<details><summary>See axolotl config</summary>
-axolotl version: `0.7.0`
-```yaml
-# Base model configuration
-base_model: unsloth/Mistral-Nemo-Instruct-2407-bnb-4bit
-load_in_4bit: true
-# Dataset configuration
-datasets:
-  - path: instruction_solution_to_thought_dataset.jsonl
-    type: chat_template
-# Chat template
-chat_template: chatml
-# LoRA adapter configuration
-adapter: lora
-lora_r: 16
-lora_alpha: 16
-lora_dropout: 0
-lora_target_modules:
-  - q_proj
-  - k_proj
-  - v_proj
-  - o_proj
-  - gate_proj
-  - up_proj
-  - down_proj
-# Training hyperparameters
-max_seq_length: 128000
-micro_batch_size: 2
-gradient_accumulation_steps: 8
-learning_rate: 3e-5
-num_epochs: 2
-warmup_steps: 100
-optimizer: adamw_8bit
-weight_decay: 0.01
-lr_scheduler_type: cosine
-max_grad_norm: 1.0
-output_dir: ./outputs_solution_to_thought
-seed: 3407
-merge_lora: true
-hf_upload: true
-hf_repo: secemp9/TraceBack-12b
-xformers_attention:
-flash_attention: True
-#lora_mlp_kernel: true
-#lora_qkv_kernel: true
-#lora_o_kernel: true
-#fp16: true
-#load_in_8bit: true  # Enable 8-bit loading for LoRA finetuning
-bf16: true          # Enable BF16 mixed precision
-# Multi-GPU training with DeepSpeed
-deepspeed: deepspeed_configs/zero2.json
-# Optional: Enable gradient checkpointing
-gradient_checkpointing: true
-```
-</details><br>
-# outputs_solution_to_thought
-This model is a fine-tuned version of [unsloth/Mistral-Nemo-Instruct-2407-bnb-4bit](https://huggingface.co/unsloth/Mistral-Nemo-Instruct-2407-bnb-4bit) on the instruction_solution_to_thought_dataset.jsonl dataset.
-## Model description
-More information needed
-## Intended uses & limitations
-More information needed
-## Training and evaluation data
-More information needed
-## Training procedure
-### Training hyperparameters
-The following hyperparameters were used during training:
-- learning_rate: 3e-05
-- train_batch_size: 2
-- eval_batch_size: 2
-- seed: 3407
-- distributed_type: multi-GPU
-- num_devices: 8
-- gradient_accumulation_steps: 8
-- total_train_batch_size: 128
-- total_eval_batch_size: 16
-- optimizer: Use adamw_8bit with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
-- lr_scheduler_type: cosine
-- lr_scheduler_warmup_steps: 100
-- num_epochs: 2.0
-### Training results
-### Framework versions
-- PEFT 0.14.0
-- Transformers 4.48.3
-- Pytorch 2.5.1+cu124
-- Datasets 3.2.0
-- Tokenizers 0.21.0

   results: []
 ---
+# TraceBack 12b Release
+TraceBack is what I came up with when I thought, "how can we scale reasoning trace data generation effectively?"
+Turn out you do not need to depend on just reasoning models (r1, o1, o3, etc) to create reasoning trace!
+It has many goals in mind, but mainly:
+- enabling faster synthetic reasoning dataset generation, since we're using a small model here (smaller than r1, etc) so faster to do inference on, thus easier to scale
+- control of the style of reasoning (system 2 thinking, etc)
+- converting any non-reasoning model output/datasets to a reasoning synthetic dataset when used as input
+So far, current proof of concept managed to check the boxes for 1 and 3, and I plan on scaling this more as:
+- this only use Mistral nemo 12b as base
+- Was only trained for 2 epoch
+- Only 200k samples were used for finetuning (Qlora)
+So there are still much room for improvement
+This was trained using both instruction and solution as input, and the output being a plausible/possible/matching reasoning trace based on that.
+I believe this is the future of reasoning data generation. Stay tuned for an eval release