alphaaico commited on
Commit
e8e7693
·
verified ·
1 Parent(s): b5c642f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +73 -5
README.md CHANGED
@@ -1,12 +1,14 @@
1
  ---
2
- base_model: unsloth/llama-3.2-3b-instruct-unsloth-bnb-4bit
 
 
3
  tags:
4
  - text-generation-inference
5
  - transformers
6
  - unsloth
7
  - llama
8
- - trl
9
- - grpo
10
  license: apache-2.0
11
  language:
12
  - en
@@ -16,8 +18,74 @@ language:
16
 
17
  - **Developed by:** alphaaico
18
  - **License:** apache-2.0
19
- - **Finetuned from model :** unsloth/llama-3.2-3b-instruct-unsloth-bnb-4bit
20
 
21
  This llama model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
22
 
23
- [<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ base_model:
3
+ - llama-3.2-3b-instruct-bnb-4bit
4
+ - unsloth/Llama-3.2-3B-Instruct-bnb-4bit
5
  tags:
6
  - text-generation-inference
7
  - transformers
8
  - unsloth
9
  - llama
10
+ - gguf
11
+ - GRPO
12
  license: apache-2.0
13
  language:
14
  - en
 
18
 
19
  - **Developed by:** alphaaico
20
  - **License:** apache-2.0
21
+ - **Finetuned from model :** llama-3.2-3b-instruct-bnb-4bit
22
 
23
  This llama model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
24
 
25
+ **Deep-Reason-SMALL-V0**
26
+
27
+ Overview
28
+ Deep-Reason-SMALL-V0 is a fine-tuned version of llama-3.2-3b-instruct, designed for advanced reasoning and thinking capabilities. It has been trained using Reasoning GRPO techniques and a custom dataset curated for enhancing logical inference, decision-making, and structured reasoning.
29
+
30
+ Built with Unsloth and Hugging Face’s TRL, this model is optimized for faster inference and superior logical performance.
31
+
32
+ The model is available in GGUF and 16 Bit format and has been quantized to different levels to support various hardware configurations.
33
+
34
+ **Model Details**
35
+ - Base Model: LLaMA-3 3B
36
+ - Fine-tuned By: Alpha AI
37
+ - Training Framework: Unsloth
38
+
39
+ **Quantization Levels Available**
40
+ - q4_k_m
41
+ - q5_k_m
42
+ - q8_0
43
+ - 16 Bit (This)
44
+
45
+ GGUF Models - https://huggingface.co/alpha-ai/Deep-Reason-SMALL-V0-GGUF
46
+
47
+ **Key Features**
48
+ - Enhanced Reasoning: Fine-tuned using GRPO to improve problem-solving and structured thought processes.
49
+ - Optimized for Thinking Tasks: Excels in logical, multi-step, and causal reasoning.
50
+ - Structured XML Responses: Outputs are formatted using a structured reasoning-answer format for easy parsing. Outputs are formatted using structured &lt;reasoning&gt;...&lt;/think&gt; and &lt;answer&gt;...&lt;/answer&gt; sections for easy parsing.
51
+ - Efficient Deployment: Available in GGUF format for local AI deployments on consumer hardware.
52
+
53
+ **Response Format & Parsing Instructions**
54
+ Deep-Reason-SMALL-V0 follows a structured response format with designated XML-like tags for easy parsing. The XML responses will include tokens such as &lt;reasoning&gt;...&lt;/reasoning&gt; and &lt;answer&gt;...&lt;/answer&gt;. Users must extract the tokens accordingly when using programmatically. This ensures clarity and traceability in decision-making.
55
+
56
+ **Ideal Configuration for using the GGUF Models**
57
+ - temperature = 0.8
58
+ - top_p = 0.95
59
+ - max_tokens = 1024
60
+ - SYSTEM_PROMPT = """
61
+ Respond in the following format:
62
+ &lt;reasoning&gt;
63
+ ...
64
+ &lt;/reasoning&gt;
65
+ &lt;answer&gt;
66
+ ...
67
+ &lt;/answer&gt;
68
+ """
69
+
70
+ **Use Cases**
71
+ Deep-Reason-SMALL-V0 is best suited for:
72
+ - Conversational AI – Improving chatbot and AI assistant reasoning.
73
+ - AI Research – Studying logical thought modeling in AI.
74
+ - Automated Decision Making – Use in AI-powered business intelligence systems.
75
+ - Education & Tutoring – Helping students and professionals with structured learning.
76
+ - Legal & Financial Analysis – Generating step-by-step arguments for case studies.
77
+
78
+ **Limitations & Considerations**
79
+ - May require further fine-tuning for domain-specific logic.
80
+ - Not a factual knowledge base – Focused on reasoning, not general knowledge retrieval.
81
+ - Potential biases – Results depend on training data.
82
+ - Computational Trade-off – Reasoning performance comes at the cost of slightly longer inference times.
83
+
84
+ **License**
85
+
86
+ This model is released under a permissible license.
87
+
88
+ **Acknowledgments**
89
+
90
+ Special thanks to the Unsloth team for providing an optimized training pipeline for LLaMA models.
91
+