alpha-ai
/

Deep-Reason-SMALL-V0

@@ -1,12 +1,14 @@
 ---
-base_model: unsloth/llama-3.2-3b-instruct-unsloth-bnb-4bit
 tags:
 - text-generation-inference
 - transformers
 - unsloth
 - llama
-- trl
-- grpo
 license: apache-2.0
 language:
 - en
@@ -16,8 +18,74 @@ language:
 - **Developed by:** alphaaico
 - **License:** apache-2.0
-- **Finetuned from model :** unsloth/llama-3.2-3b-instruct-unsloth-bnb-4bit
 This llama model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
-[<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)

 ---
+base_model:
+- llama-3.2-3b-instruct-bnb-4bit
+- unsloth/Llama-3.2-3B-Instruct-bnb-4bit
 tags:
 - text-generation-inference
 - transformers
 - unsloth
 - llama
+- gguf
+- GRPO
 license: apache-2.0
 language:
 - en
 - **Developed by:** alphaaico
 - **License:** apache-2.0
+- **Finetuned from model :** llama-3.2-3b-instruct-bnb-4bit
 This llama model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
+**Deep-Reason-SMALL-V0**
+Overview
+Deep-Reason-SMALL-V0 is a fine-tuned version of llama-3.2-3b-instruct, designed for advanced reasoning and thinking capabilities. It has been trained using Reasoning GRPO techniques and a custom dataset curated for enhancing logical inference, decision-making, and structured reasoning.
+Built with Unsloth and Hugging Face’s TRL, this model is optimized for faster inference and superior logical performance.
+The model is available in GGUF and 16 Bit format and has been quantized to different levels to support various hardware configurations.
+**Model Details**
+- Base Model: LLaMA-3 3B
+- Fine-tuned By: Alpha AI
+- Training Framework: Unsloth
+**Quantization Levels Available**
+- q4_k_m
+- q5_k_m
+- q8_0
+- 16 Bit (This)
+GGUF Models - https://huggingface.co/alpha-ai/Deep-Reason-SMALL-V0-GGUF
+**Key Features**
+- Enhanced Reasoning: Fine-tuned using GRPO to improve problem-solving and structured thought processes.
+- Optimized for Thinking Tasks: Excels in logical, multi-step, and causal reasoning.
+- Structured XML Responses: Outputs are formatted using a structured reasoning-answer format for easy parsing. Outputs are formatted using structured &lt;reasoning&gt;...&lt;/think&gt; and &lt;answer&gt;...&lt;/answer&gt; sections for easy parsing.
+- Efficient Deployment: Available in GGUF format for local AI deployments on consumer hardware.
+**Response Format & Parsing Instructions**
+Deep-Reason-SMALL-V0 follows a structured response format with designated XML-like tags for easy parsing. The XML responses will include tokens such as &lt;reasoning&gt;...&lt;/reasoning&gt; and &lt;answer&gt;...&lt;/answer&gt;. Users must extract the tokens accordingly when using programmatically. This ensures clarity and traceability in decision-making.
+**Ideal Configuration for using the GGUF Models**
+- temperature = 0.8
+- top_p = 0.95
+- max_tokens = 1024
+- SYSTEM_PROMPT = """
+Respond in the following format:
+&lt;reasoning&gt;
+...
+&lt;/reasoning&gt;
+&lt;answer&gt;
+...
+&lt;/answer&gt;
+"""
+**Use Cases**
+Deep-Reason-SMALL-V0 is best suited for:
+- Conversational AI – Improving chatbot and AI assistant reasoning.
+- AI Research – Studying logical thought modeling in AI.
+- Automated Decision Making – Use in AI-powered business intelligence systems.
+- Education & Tutoring – Helping students and professionals with structured learning.
+- Legal & Financial Analysis – Generating step-by-step arguments for case studies.
+**Limitations & Considerations**
+- May require further fine-tuning for domain-specific logic.
+- Not a factual knowledge base – Focused on reasoning, not general knowledge retrieval.
+- Potential biases – Results depend on training data.
+- Computational Trade-off – Reasoning performance comes at the cost of slightly longer inference times.
+**License**
+This model is released under a permissible license.
+**Acknowledgments**
+Special thanks to the Unsloth team for providing an optimized training pipeline for LLaMA models.