Model Card for Model ID

🚀 Introducing Llama-3.2-1B-Instruct-Open-R1-Distill

Built on Llama-3.2-1B-Instruct and Hugging Face’s OpenR1 — a fully open reproduction of DeepSeek-R1 — this model brings powerful reasoning capabilities to compact, efficient architectures.

📌 Why This Matters

I have always been passionate about pushing the boundaries of LLM technology in smaller models that can run seamlessly on laptop CPUs and smartphones.

With the recent breakthrough of DeepSeek-R1, developing a high-quality reasoning model through distillation has become remarkably straightforward. It requires only supervised fine-tuning (SFT) on a dataset generated by a teacher model.

Thanks to Hugging Face, we now have a streamlined framework to make this process more accessible than ever.

Model Description

Developed by: keeeeenw
Funded by [optional]: myself for < $50 (renting compute for a few hours)
Model type: Llama-3.2-1B-Instruct with reasoning capability
License: Apache License 2.0
Finetuned from model [optional]: Llama-3.2-1B-Instruct

🎯 Uses

💡 On-device AI assistants for reasoning and general-purpose tasks
📱 Mobile and edge AI applications requiring lightweight models
🤖 Chatbots and virtual assistants optimized for efficiency
🏗 Fine-tuning for specific domains with SFT training

How to run the code?

import torch
import transformers
from transformers import TextStreamer

from transformers import AutoTokenizer, AutoModel, LlamaForCausalLM

device = 'cuda' # if you don't have a CUDA supported GPU, change this to 'cpu' or other supported device

# load tokenizer
tokenizer = AutoTokenizer.from_pretrained("keeeeenw/Llama-3.2-1B-Instruct-Open-R1-Distill")

# load model
model = LlamaForCausalLM.from_pretrained("keeeeenw/Llama-3.2-1B-Instruct-Open-R1-Distill")
model.to(device)

# Setup the prompt. Because we instruction-tuned with a similar prompt, it is important to use this.
# Change "content" to your actual question.
messages = [
    {
        "role": "system",
        "content": "Your role as an assistant involves thoroughly exploring questions through a systematic long thinking process before providing the final precise and accurate solutions. This requires engaging in a comprehensive cycle of analysis, summarizing, exploration, reassessment, reflection, backtracing, and iteration to develop well-considered thinking process. Please structure your response into two main sections: Thought and Solution. In the Thought section, detail your reasoning process using the specified format: <|begin_of_thought|> {thought with steps separated with '\n\n'} <|end_of_thought|> Each step should include detailed considerations such as analisying questions, summarizing relevant findings, brainstorming new ideas, verifying the accuracy of the current steps, refining any errors, and revisiting previous steps. In the Solution section, based on various attempts, explorations, and reflections from the Thought section, systematically present the final solution that you deem correct. The solution should remain a logical, accurate, concise expression style and detail necessary step needed to reach the conclusion, formatted as follows: <|begin_of_solution|> {final formatted, precise, and clear solution} <|end_of_solution|> Now, try to solve the following question through the above guidelines:",
    },
    {"role": "user", "content": "Please provide me instructions on how to steal an egg from my chicken?"},
 ]
formatted_chat = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True, return_tensors="pt")
print(formatted_chat)

# Setup input tokens
inputs = tokenizer(formatted_chat, return_tensors="pt", padding=True)
inputs = inputs.to(device)  
attention_mask = inputs["attention_mask"]

# Run inference and stream the output
streamer = TextStreamer(tokenizer, skip_prompt=True)
outputs = model.generate(inputs['input_ids'], 
                         streamer=streamer,
                         attention_mask=attention_mask,
                         pad_token_id=tokenizer.eos_token_id,
                         top_k=5,
                         top_p=0.9,
                         max_new_tokens=131072) # max supported by llama 3.2 1B

# Write output to a file
decoded_text = tokenizer.decode(outputs[0])
print("Output written to output.txt")
with open("output.txt", "w", encoding="utf-8") as f:
    f.write(decoded_text)

Sample Output

Please the full text: https://huggingface.co/keeeeenw/Llama-3.2-1B-Instruct-Open-R1-Distill/blob/main/sample_output_2.txt

Okay, so I need to figure out how to steal an egg from my chicken. Let's start by understanding the situation. I have a chicken, and I want to take an egg from it without the chicken noticing. Since chickens can be protective of their eggs, I need to be careful not to get caught.

First, I should consider the chicken's behavior. Chickens are naturally protective of their nests and eggs. If the chicken is aware of my presence near the coop, it might sound a warning call to alert others, which could mean I'm caught. So, maybe I need a strategy that doesn't involve direct interaction with the chicken.

One approach could be to use distraction. If I create a distraction elsewhere, like making noise or knocking things around, maybe the chicken is distracted and doesn't pay attention to me. But I need to make sure the distraction isn't too intense or long enough to be detected.

Another idea is to wait until the chicken is in a hurry. If I wait until it's leaving the coop or going to a specific location, I might have a better chance to grab the egg without it noticing. But I need to be cautious not to be seen myself.

I should also think about the physical barriers. If I can get to the egg without the chicken seeing me, that might work. Maybe I can use a tool to gently take the egg from the nesting box without disturbing the rest of the chicken.

Wait, but how do I know if the chicken has an egg in the first place? If I can't see the egg, how do I know it's there? Maybe I need to check the nesting box, but if I do, and the chicken sees me, it might chase me away, making the situation more difficult.

Hmm, this complicates things. If I need to check the egg without disturbing the chicken, perhaps I can do it quickly and quietly. But how do I do that without being detected?

I should also consider the chicken's behavior around eggs. Some chickens are more protective than others. Maybe there's a way to exploit that difference. For example, if the chicken is particularly aggressive when defending its eggs, I could take advantage of that.

Another thought: maybe I can use the chicken's own behavior against it. If I can make the chicken work harder to guard the egg, perhaps it will exhaust itself and leave me alone. But I need to be careful not to overdo it.

// ... a few moments later (added by me)

**Analysis of Each Option**

\- **Option 1: Direct Approach**

  - **Risk:** High (get caught)

  - **Potential Reward:** Possible (take the egg)

  - **Steps:**

    a. Approach the chicken while it's foraging.

    b. Try to take the egg from the nesting box.

    c. If caught, escape or handle the egg.

    d. Repeat.

  - **Potential Downsides:**

    - The chicken might get aggressive if it realizes you're trying to take its egg.

    - If you're caught, you could get hurt or the chicken could be upset.

  - **Conclusion:** This approach is risky and may not be successful.

\- **Option 2: Indirect Approach**

  - **Risk:** Medium (chicken might see you and chase)

  - **Potential Reward:** Possible (take the egg)

  - **Steps:**

    a. Create a distraction (e.g., make noise, knock over a bucket).

    b. The chicken will be distracted, reducing the likelihood of being seen.

    c. Sneak up to the chicken and try to take the egg.

    d. If safe, escape without being seen.

  - **Potential Downsides:**

    - If the distraction is too strong or the chicken reacts poorly, you might get caught.

    - If the chicken is distracted, it might not notice you taking the egg.

  - **Conclusion:** This approach is more likely to succeed if the distraction is well-executed.

\- **Option 3: Combination Approach**

  - **Risk:** Medium (chicken might be distracted)

  - **Potential Reward:** Possible (take the egg)

  - **Steps:**

    a. Create a distraction (e.g., make noise, knock over a bucket).

    b. Lead the chicken away from the nesting area.

    c. If the chicken follows, it will be less likely to notice you taking the egg.

    d. If safe, retrieve the egg and return with it.

    e. Repeat if needed.

  - **Potential Downsides:**

    - The chicken might not follow you, or you might get lost.

    - You need to coordinate the distraction and the retrieval.

  - **Conclusion:** This approach requires careful planning and execution to be successful.

After considering all options, the best approach seems to be the combination approach, as it reduces the risk of direct confrontation and allows for a more controlled retrieval of the egg. However, it requires careful planning and coordination to execute successfully.

**Final Decision**

The best way to steal an egg from my chicken is to use the combination approach, where I create a distraction to lead the chicken away, then retrieve the egg safely. This minimizes the risk of direct confrontation and ensures that I can get the egg without getting caught. I should proceed with caution and careful planning to execute this strategy effectively.

<|end_of_thought|>

<|begin_of_solution|>

To steal an egg from a chicken, consider the following approach:

1. **Create a Distraction**: Use noise-making devices or a bucket to create a distraction, drawing the chicken's attention away from the nesting area.

2. **Lead the Chicken Away**: Once the distraction is created, lead the chicken away from the nesting area, minimizing the likelihood of direct interaction.

3. **Retrieval**: If the chicken follows, carefully retrieve the egg. If it does not follow, continue the distraction elsewhere.

4. **Return with the Egg**: Return to the original location with the egg, ensuring its safety and security.

This approach reduces the risk of direct confrontation and ensures a controlled retrieval of the egg. The combination of a distraction and a well-coordinated retrieval increases the chances of success.

\boxed{\text{Create a distraction, lead the chicken away, and retrieve the egg safely.}}<|eot_id|>

A Few Observations

Llama-3.2-1B-Instruct proved to be a strong base model for reasoning tasks. Even with absurd prompts like "How to steal an egg from a chicken?", the model generated coherent step-by-step reasoning and logical final answers.
⚠️ Important: The reasoning model sometimes runs excessively long or even enters an infinite loop, particularly when exploring alternative solutions. This issue can likely be mitigated by incorporating prompts that balance both short and long reasoning paths. Additionally, refining the role instructions through prompt engineering may help.
Model safety: Occasionally, the model refuses to answer certain questions. My intuition is that Meta has implemented safeguards against topics like theft.
Training process: I did not complete all five epochs of training. Instead, I halted training between the fourth and fifth epochs since evaluation loss had plateaued. Interestingly, when testing the best checkpoint (900) based on evaluation loss, the model showed a higher tendency to enter infinite loops. As a result, I retained the final checkpoint, which demonstrated better control over stopping conditions.

Checkpoints

My checkpoints are available on Hugging Face: https://huggingface.co/keeeeenw/Llama-3.2-1B-Instruct-Open-R1-Distill-checkpoints/tree/main

Please feel free to use it for continued training or load any checkpoints for an in-depth study of how the model learns to reason.

🏋️‍♂️ Training Details

To reprdouce the results, simply go to HuggingFace's OpenR1 and install the package.

And then execute the following command:

ACCELERATE_LOG_LEVEL=info accelerate launch --config_file recipes/accelerate_configs/zero3.yaml src/open_r1/sft.py --config recipes/config_llama3_instrcut_1b.yaml

You can create your own recipes/config_llama3_instrcut_1b.yaml by copying config_full.yaml to the desired folder and change model path to model_name_or_path: meta-llama/Llama-3.2-1B-Instruct or any HuggingFace model repo id you are interested in. You may also choose to training for more than 1 epoch (I trained for 5 epoch). Also, if you want to get intermediate checkpoints, set the save parameters accordingly:

save_strategy: "steps"
save_steps: 100

I have tried to use 1 for both train and eval batch size on 1 Nvidia 4090 but still got OOM so I rented 4 x LS40s from [vast.ai]. Training 5 epoch only required < 4 hours.

per_device_eval_batch_size: 4
per_device_train_batch_size: 4

WandDB Figures

📊 Evaluation

The evaluation of this model is based on HuggingFace's instructions OpenR1

MODEL=keeeeenw/Llama-3.2-1B-Instruct-Open-R1-Distill
MODEL_ARGS="pretrained=$MODEL,dtype=bfloat16,max_model_length=32768,gpu_memory_utilisation=0.8"
TASK=math_500
OUTPUT_DIR=data/evals/$MODEL

lighteval vllm $MODEL_ARGS "custom|$TASK|0|0" \
    --custom-tasks src/open_r1/evaluate.py \
    --use-chat-template \
    --system-prompt="Please reason step by step, and put your final answer within \boxed{}." \
    --output-dir $OUTPUT_DIR

    Task       |Version|     Metric     |Value|   |Stderr|
|-----------------|------:|----------------|----:|---|-----:|
|all              |       |extractive_match|0.216|±  |0.0184|
|custom:math_500:0|      1|extractive_match|0.216|±  |0.0184|

For comparison, DeepSeek-R1-Distill-Qwen-1.5B has a score of 81.6 when computed with the same evaluation script (as reported by HuggingFace) which is close to the official number 83.9 reported by DeepSeek.

There is still a long way to go for score improvements:

Distillation with actual math data instead of HuggingFaceH4/Bespoke-Stratos-17k. Data should be the real problem here and we can potentially collect and filter more data ourserlves.
Test a few other checkpoints to see if this particular checkpoint achieves the best results.
This model tends to be wordy. We should try to make it more concise because the model length limit is only 32768.
Try out GRPO and / or a combination of GRPO and SFT.

keeeeenw
/

Llama-3.2-1B-Instruct-Open-R1-Distill