|
--- |
|
library_name: transformers |
|
license: apache-2.0 |
|
base_model: unsloth/Mistral-Nemo-Instruct-2407-bnb-4bit |
|
tags: |
|
- generated_from_trainer |
|
datasets: |
|
- instruction_solution_to_thought_dataset.jsonl |
|
- secemp9/instruction_solution_thought |
|
model-index: |
|
- name: outputs_solution_to_thought |
|
results: [] |
|
--- |
|
 |
|
|
|
# TraceBack 12b Release |
|
|
|
|
|
|
|
TraceBack is what I came up with when I thought, "how can we scale reasoning trace data generation effectively?" |
|
|
|
Turn out you do not need to depend on just reasoning models (r1, o1, o3, etc) to create reasoning trace! |
|
|
|
It has many goals in mind, but mainly: |
|
- enabling faster synthetic reasoning dataset generation, since we're using a small model here (smaller than r1, etc) so faster to do inference on, thus easier to scale |
|
- distill on synthetic traces for out of domain non-verifiable problems |
|
- converting any non-reasoning model output/datasets to a reasoning synthetic dataset when used as input |
|
|
|
So far, current proof of concept managed to check the boxes for 1 and 3, and I plan on scaling this more as: |
|
- this only use Mistral Nemo 12b as base |
|
- Was only trained for 2 epochs |
|
- Only 200k samples were used for finetuning (Qlora), dataset at https://huggingface.co/datasets/secemp9/instruction_solution_thought |
|
|
|
So there are still much room for improvement |
|
|
|
This was trained using both instruction and solution as input, and the output being a plausible/possible/matching reasoning trace based on that. |
|
|
|
I believe this is the future of reasoning data generation. Stay tuned for an eval release |
|
|
|
Here some inference example, using chatgpt instruction + solution as input: |
|
|
|
# Inference Example |
|
Here I use a simple example from chatgpt, passing both the instruction and the solution as input to the model: |
|
 |
|
|
|
# Dataset Example |
|
|
|
Here the format for the dataset follow instruction + solution: reasoning trace pairs |
|
Sample conversation: |
|
``` |
|
{ |
|
"messages": [ |
|
{ |
|
"role": "user", |
|
"content": "Instruction: |
|
text_here |
|
|
|
Solution: |
|
text_here |
|
}, |
|
{ |
|
"role": "assistant", |
|
"content": "text_here" |
|
} |
|
] |
|
} |
|
``` |
|
which look like: |
|
 |
|
|
|
# Prompt Format |
|
|
|
For the prompt format, I was really trying to not overengineer, but I'm sure there is a better way to format this. |
|
|
|
For now it's just: |
|
Instruction: |
|
|
|
Solution: |
|
|
|
the output of the model doesn't have (for now) any formatting, it's just reasoning as output |
|
|
|
# Code Example |
|
|
|
- Using transformers: |
|
```python |
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
import torch |
|
|
|
# Load the tokenizer and model |
|
model_name = "secemp9/TraceBack-12b" |
|
tokenizer = AutoTokenizer.from_pretrained(model_name) |
|
model = AutoModelForCausalLM.from_pretrained(model_name) |
|
|
|
# Move the model to the desired device |
|
device = 'cuda' if torch.cuda.is_available() else 'cpu' |
|
model.to(device) |
|
|
|
# Define the messages |
|
messages = [ |
|
{"role": "user", "content": """Instruction: |
|
how many r in strawberry |
|
|
|
|
|
Solution: |
|
There are **three** "r"s in "strawberry." |
|
"""} |
|
] |
|
|
|
# Step 1: Apply chat template to get formatted text as a string |
|
formatted_text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) |
|
|
|
# Step 2: Tokenize the formatted text into a dictionary of tensors |
|
inputs = tokenizer(formatted_text, return_tensors="pt").to(device) |
|
|
|
# Generate the response |
|
outputs = model.generate(**inputs, max_new_tokens=32000) |
|
|
|
# Decode and print the output |
|
generated_text = tokenizer.decode(outputs[0]) |
|
print(generated_text) |
|
``` |
|
|
|
- unsloth |
|
```python |
|
from unsloth import FastLanguageModel |
|
|
|
# Load the model and tokenizer |
|
model, tokenizer = FastLanguageModel.from_pretrained("secemp9/TraceBack-12b") |
|
|
|
# Define the messages (replace "stuff_here" with your actual input) |
|
messages = [ |
|
{"role": "user", "content": """Instruction: |
|
how many r in strawberry |
|
|
|
|
|
Solution: |
|
There are **three** "r"s in "strawberry." |
|
"""} |
|
] |
|
|
|
# Step 1: Apply chat template to get formatted text as a string |
|
formatted_text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) |
|
|
|
# Step 2: Tokenize the formatted text into a dictionary of tensors |
|
inputs = tokenizer(formatted_text, return_tensors="pt").to(model.device) |
|
|
|
# Generate the response |
|
outputs = model.generate(**inputs, max_new_tokens=32000) |
|
|
|
# Decode and print the output |
|
generated_text = tokenizer.decode(outputs[0]) |
|
print(generated_text) |
|
``` |
|
# Axolotl config |
|
|
|
For this, I basically tried to convert my unsloth code to an axolotl config file. I also used deepspeed. Configuration below: |
|
|
|
config.yml |
|
``` |
|
# Base model configuration |
|
base_model: unsloth/Mistral-Nemo-Instruct-2407-bnb-4bit |
|
load_in_4bit: true |
|
|
|
# Dataset configuration |
|
datasets: |
|
- path: instruction_solution_to_thought_dataset.jsonl |
|
type: chat_template |
|
|
|
# Chat template |
|
chat_template: chatml |
|
|
|
# LoRA adapter configuration |
|
adapter: lora |
|
lora_r: 16 |
|
lora_alpha: 16 |
|
lora_dropout: 0 |
|
lora_target_modules: |
|
- q_proj |
|
- k_proj |
|
- v_proj |
|
- o_proj |
|
- gate_proj |
|
- up_proj |
|
- down_proj |
|
|
|
# Training hyperparameters |
|
max_seq_length: 128000 |
|
micro_batch_size: 2 |
|
gradient_accumulation_steps: 8 |
|
learning_rate: 3e-5 |
|
num_epochs: 3 |
|
warmup_steps: 100 |
|
optimizer: adamw_8bit |
|
weight_decay: 0.01 |
|
lr_scheduler_type: cosine |
|
max_grad_norm: 1.0 |
|
output_dir: ./outputs_solution_to_thought |
|
seed: 3407 |
|
merge_lora: true |
|
hf_upload: true |
|
hf_repo: secemp9/TraceBack-12b |
|
xformers_attention: |
|
flash_attention: True |
|
bf16: true # Enable BF16 mixed precision |
|
# Multi-GPU training with DeepSpeed |
|
deepspeed: deepspeed_configs/zero2.json |
|
|
|
# Optional: Enable gradient checkpointing |
|
gradient_checkpointing: true |
|
``` |
|
|
|
deepspeed_configs/zero2.json |
|
``` |
|
{ |
|
"zero_optimization": { |
|
"stage": 2, |
|
"allgather_partitions": true, |
|
"allgather_bucket_size": 2e8, |
|
"overlap_comm": true, |
|
"reduce_scatter": true, |
|
"reduce_bucket_size": 2e8, |
|
"contiguous_gradients": true |
|
}, |
|
"bf16": { |
|
"enabled": true |
|
}, |
|
"optimizer": { |
|
"type": "AdamW", |
|
"params": { |
|
"lr": "auto", |
|
"weight_decay": "auto", |
|
"betas": [0.9, 0.999], |
|
"eps": 1e-8 |
|
} |
|
}, |
|
"scheduler": { |
|
"type": "WarmupLR", |
|
"params": { |
|
"warmup_min_lr": 0, |
|
"warmup_max_lr": "auto", |
|
"warmup_num_steps": "auto" |
|
} |
|
}, |
|
"train_micro_batch_size_per_gpu": "auto", |
|
"gradient_accumulation_steps": "auto", |
|
"steps_per_print": 10, |
|
"wandb": { |
|
"enabled": true |
|
} |
|
} |
|
``` |