File size: 4,966 Bytes

---
base_model:
- llama-3.2-3b-instruct-bnb-4bit
- unsloth/Llama-3.2-3B-Instruct-bnb-4bit
tags:
- text-generation-inference
- transformers
- unsloth
- llama
- gguf
- GRPO
license: apache-2.0
language:
- en
---

<div align="center">
<img src="https://cdn-uploads.huggingface.co/production/uploads/669777597cb32718c20d97e9/4emWK_PB-RrifIbrCUjE8.png"
     alt="Title card" 
     style="width: 500px;
            height: auto;
            object-position: center top;">
</div>

**Website - https://www.alphaai.biz**

# Uploaded  model

- **Developed by:** alphaaico
- **License:** apache-2.0
- **Finetuned from model :** llama-3.2-3b-instruct-bnb-4bit

This llama model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.

**Deep-Reason-SMALL-V0**

Overview
Deep-Reason-SMALL-V0 is a fine-tuned version of llama-3.2-3b-instruct, designed for advanced reasoning and thinking capabilities. It has been trained using Reasoning GRPO techniques and a custom dataset curated for enhancing logical inference, decision-making, and structured reasoning.

Built with Unsloth and Hugging Face’s TRL, this model is optimized for faster inference and superior logical performance.

The model is available in GGUF and 16 Bit format and has been quantized to different levels to support various hardware configurations.

**Model Details**
- Base Model: LLaMA-3 3B
- Fine-tuned By: Alpha AI
- Training Framework: Unsloth

**Quantization Levels Available**
- q4_k_m
- q5_k_m
- q8_0
- 16 Bit (This)

GGUF Models - https://huggingface.co/alpha-ai/Deep-Reason-SMALL-V0-GGUF

**Key Features**
- Enhanced Reasoning: Fine-tuned using GRPO to improve problem-solving and structured thought processes.
- Optimized for Thinking Tasks: Excels in logical, multi-step, and causal reasoning.
- Structured XML Responses: Outputs are formatted using a structured reasoning-answer format for easy parsing. Outputs are formatted using structured &lt;reasoning&gt;...&lt;/think&gt; and &lt;answer&gt;...&lt;/answer&gt; sections for easy parsing.
- Efficient Deployment: Available in GGUF format for local AI deployments on consumer hardware.

**Response Format & Parsing Instructions**
Deep-Reason-SMALL-V0 follows a structured response format with designated XML-like tags for easy parsing. The XML responses will include tokens such as &lt;reasoning&gt;...&lt;/reasoning&gt; and &lt;answer&gt;...&lt;/answer&gt;. Users must extract the tokens accordingly when using programmatically. This ensures clarity and traceability in decision-making.

**Ideal Configuration for using the GGUF Models**
- temperature = 0.8
- top_p = 0.95
- max_tokens = 1024
- SYSTEM_PROMPT = """
Respond in the following format:
&lt;reasoning&gt;
...
&lt;/reasoning&gt;
&lt;answer&gt;
...
&lt;/answer&gt;
"""

**Use Cases**
Deep-Reason-SMALL-V0 is best suited for:
- Conversational AI – Improving chatbot and AI assistant reasoning.
- AI Research – Studying logical thought modeling in AI.
- Automated Decision Making – Use in AI-powered business intelligence systems.
- Education & Tutoring – Helping students and professionals with structured learning.
- Legal & Financial Analysis – Generating step-by-step arguments for case studies.

**Limitations & Considerations**
- May require further fine-tuning for domain-specific logic.
- Not a factual knowledge base – Focused on reasoning, not general knowledge retrieval.
- Potential biases – Results depend on training data.
- Computational Trade-off – Reasoning performance comes at the cost of slightly longer inference times.

**License**

This model is released under a permissible license. 

**Acknowledgments**

Special thanks to the Unsloth team for providing an optimized training pipeline for LLaMA models.

**Disclaimer**
This model has been saved in the .bin format because it was trained using Unsloth. The .bin format is the default PyTorch serialization method and functions as expected. However, .bin files use Python's pickle module, which can execute arbitrary code during loading.

If security is a concern, we strongly recommend loading the model in a sandboxed environment such as staging servers, Kaggle, or Google Colab before deploying in production. You can also convert the model to .safetensors, a more secure and optimized format, using the following approach:

```python
from transformers import AutoModel
from safetensors.torch import save_file

# Load model
model = AutoModel.from_pretrained("path/to/model")
state_dict = model.state_dict()

# Convert to safetensors
save_file(state_dict, "model.safetensors")

print("Model converted to safetensors successfully.")
```
Alternatively, you can use our GGUF models, which are optimized for inference with llama.cpp, exllama, and other efficient runtimes. GGUF provides better performance on CPU/GPU and is a more portable option for deployment.

Choose the format that best suits your security, performance, and deployment needs.