Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,4 @@
|
|
|
|
1 |
---
|
2 |
# 🦄 Model Card
|
3 |
base_model: unsloth/Qwen2.5-3B-Instruct
|
@@ -22,57 +23,145 @@ language:
|
|
22 |
| **Finetuned from** | `unsloth/Qwen2.5-3B-Instruct` |
|
23 |
| **Training Framework**| [Unsloth](https://github.com/unslothai/unsloth) × Hugging Face TRL |
|
24 |
|
25 |
-
[<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="
|
26 |
|
27 |
---
|
28 |
|
29 |
## 🚀 What’s New?
|
|
|
30 |
|
31 |
-
|
32 |
-
|
33 |
-
|
34 |
-
|
35 |
-
|
36 |
-
|
37 |
-
- **Auxiliary Rewards**: Style, length, and JSON-validity rewards kept the model on its best behavior.
|
38 |
-
3. **2× Faster Training** – Courtesy of Unsloth’s memory-efficient tricks (flash attention + fused optimizers).
|
39 |
|
40 |
---
|
41 |
|
42 |
## 🛠️ Intended Use
|
43 |
|
44 |
-
-
|
45 |
-
|
46 |
|
47 |
---
|
48 |
|
49 |
-
##
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
50 |
|
51 |
-
|
52 |
-
|
53 |
-
|
54 |
-
|
55 |
-
|
56 |
-
|
57 |
-
|
58 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
59 |
|
60 |
---
|
61 |
|
62 |
-
##
|
63 |
|
64 |
-
|
65 |
-
|
66 |
-
|
67 |
-
|
68 |
|
69 |
-
|
70 |
|
71 |
---
|
72 |
|
73 |
-
##
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
74 |
|
75 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
76 |
|
77 |
```bibtex
|
78 |
@misc{bhaviktheslider_2025_unsloth_qwen2.5_3b_grpo,
|
@@ -80,5 +169,10 @@ If you build something cool with this model, a shout-out would be lovely:
|
|
80 |
author = {Bhaviktheslider},
|
81 |
year = {2025},
|
82 |
howpublished = {Hugging Face},
|
83 |
-
note = {https://huggingface.co/bhaviktheslider
|
84 |
}
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
|
2 |
---
|
3 |
# 🦄 Model Card
|
4 |
base_model: unsloth/Qwen2.5-3B-Instruct
|
|
|
23 |
| **Finetuned from** | `unsloth/Qwen2.5-3B-Instruct` |
|
24 |
| **Training Framework**| [Unsloth](https://github.com/unslothai/unsloth) × Hugging Face TRL |
|
25 |
|
26 |
+
[<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="180"/>](https://github.com/unslothai/unsloth)
|
27 |
|
28 |
---
|
29 |
|
30 |
## 🚀 What’s New?
|
31 |
+
Think of this as the protein-shake sequel to **MasterControlAIML/DeepSeek-R1-Qwen2.5-1.5b-SFT-R1-JSON-Unstructured-To-Structured**—but now with **3 B parameters, zero SFT, and a reward-only training regime** (GRPO) backed by an LM judge + auxiliary reward functions.
|
32 |
|
33 |
+
| Upgrade | Explanation |
|
34 |
+
|---------|-------------|
|
35 |
+
| **Bigger Backbone** | 1.5 B → **3 B** Qwen 2.5 for deeper reasoning headroom. |
|
36 |
+
| **Pure RL** | No supervised fine-tuning—policy learned *entirely* from reward signals. |
|
37 |
+
| **LM-as-Judge** | Separate LLM scores each candidate for correctness, JSON validity, length & style. |
|
38 |
+
| **2× Faster Training** | Courtesy of Unsloth’s memory-savings (flash-attention, fused ops). |
|
|
|
|
|
39 |
|
40 |
---
|
41 |
|
42 |
## 🛠️ Intended Use
|
43 |
|
44 |
+
Structured-data extraction from messy prose, logs, or transcripts.
|
45 |
+
Drop-in upgrade for any pipeline currently using the older 1.5 B DeepSeek-R1 JSON-structurer.
|
46 |
|
47 |
---
|
48 |
|
49 |
+
## 🔧 How to Use
|
50 |
+
|
51 |
+
Below is a minimal example that **re-uses the exact prompt format** from the previous model.
|
52 |
+
The model first *reasons* silently (chain-of-thought is kept internal) and then emits only the target JSON.
|
53 |
+
|
54 |
+
> **Model name** used in the snippet → `bhaviktheslider/unsloth-qwen2.5-3b-grpo-json-structurer`
|
55 |
+
> Replace with your actual repo path if different.
|
56 |
+
|
57 |
+
### 1️⃣ Transformers Quick-Start
|
58 |
+
|
59 |
+
```python
|
60 |
+
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
|
61 |
+
import torch, json, textwrap
|
62 |
+
|
63 |
+
MODEL = "MasterControlAIML/DeepSeek-R1-Qwen2.5-3b-LLM-Judge-Reward-JSON-Unstructured-To-Structured-Merged-Lora-16bit"
|
64 |
+
|
65 |
+
tok = AutoTokenizer.from_pretrained(MODEL, use_fast=True)
|
66 |
+
model = AutoModelForCausalLM.from_pretrained(
|
67 |
+
MODEL,
|
68 |
+
torch_dtype=torch.float16,
|
69 |
+
device_map="auto"
|
70 |
+
)
|
71 |
+
|
72 |
+
# --- Prompt (identical structure to previous model) ---
|
73 |
+
system_prompt = (
|
74 |
+
"You are an intelligent JSON conversion engine. "
|
75 |
+
"Think step-by-step, and then output the final valid JSON."
|
76 |
+
)
|
77 |
+
|
78 |
+
task_prompt = textwrap.dedent("""\
|
79 |
+
### Task
|
80 |
+
Convert the following unstructured text into the JSON schema shown below.
|
81 |
+
Return *only* valid JSON.
|
82 |
+
|
83 |
+
### Schema
|
84 |
+
{
|
85 |
+
"name": str,
|
86 |
+
"age": int,
|
87 |
+
"city": str,
|
88 |
+
"skills": [str]
|
89 |
+
}
|
90 |
+
|
91 |
+
### Unstructured text
|
92 |
+
John Doe, a 28-year-old software engineer living in Austin, loves Python and Golang.
|
93 |
+
""")
|
94 |
+
|
95 |
+
generator = pipeline(
|
96 |
+
"text-generation",
|
97 |
+
model=model,
|
98 |
+
tokenizer=tok,
|
99 |
+
max_new_tokens=256,
|
100 |
+
do_sample=False,
|
101 |
+
)
|
102 |
+
|
103 |
+
output = generator(f"<|system|>\n{system_prompt}\n<|user|>\n{task_prompt}")[0]["generated_text"]
|
104 |
+
|
105 |
+
data = json.loads(output) # ✅ will raise if JSON isn’t valid
|
106 |
+
print(data)
|
107 |
+
````
|
108 |
+
|
109 |
+
### 2️⃣ Text-Generation-Inference (TGI)
|
110 |
|
111 |
+
```bash
|
112 |
+
# start server (8-bit, BF16, etc. as needed)
|
113 |
+
text-generation-launcher --model-id MasterControlAIML/DeepSeek-R1-Qwen2.5-3b-LLM-Judge-Reward-JSON-Unstructured-To-Structured-Merged-Lora-16bit
|
114 |
+
|
115 |
+
# curl call
|
116 |
+
curl http://localhost:8080/generate \
|
117 |
+
-d '{
|
118 |
+
"inputs": "<|system|>\nYou are an intelligent JSON conversion engine. Think step-by-step, but ONLY output the final valid JSON.\n<|user|>\n### Task\nConvert the following unstructured text into the JSON schema shown below.\nReturn *only* valid JSON.\n\n### Schema\n{\"title\": str, \"authors\": [str], \"year\": int}\n\n### Unstructured text\n\"Deep Learning\" was written by Ian Goodfellow, Yoshua Bengio, and Aaron Courville in 2016.\n",
|
119 |
+
"parameters": {"max_new_tokens": 256, "do_sample": false}
|
120 |
+
}'
|
121 |
+
```
|
122 |
+
|
123 |
+
The response will be a pure JSON string, e.g.:
|
124 |
+
|
125 |
+
```json
|
126 |
+
{"title":"Deep Learning","authors":["Ian Goodfellow","Yoshua Bengio","Aaron Courville"],"year":2016}
|
127 |
+
```
|
128 |
|
129 |
---
|
130 |
|
131 |
+
## 🤖 Why This Prompt Works
|
132 |
|
133 |
+
1. **System role** instructs the model to plan internally and expose *only* the JSON.
|
134 |
+
2. **Schema block** constrains the output keys & types.
|
135 |
+
3. **GRPO training** rewarded strict adherence to schema validity and penalized hallucinated keys or malformed JSON.
|
136 |
+
4. **LM-as-Judge** provides dense shaping signals: structural accuracy, content fidelity, token length, even stylistic consistency.
|
137 |
|
138 |
+
The result: *reliable one-shot structuring without post-processing hacks*.
|
139 |
|
140 |
---
|
141 |
|
142 |
+
## 🏋️ Training Recipe (Condensed)
|
143 |
+
|
144 |
+
| Setting | Value |
|
145 |
+
| -------------------- | ------------------------------------------------------------------- |
|
146 |
+
| **Algorithm** | GRPO (policy ≈ LM; reward LM ≈ `Qwen2.5-7B` w/ JSON validator head) |
|
147 |
+
| **Effective Epochs** | 3 |
|
148 |
+
| **Batching** | Accum 8, bfloat16 |
|
149 |
+
| **Optimizer** | Fused AdamW |
|
150 |
+
| **Throughput** | \~45 k tokens/s on 8×A100 |
|
151 |
+
|
152 |
+
---
|
153 |
+
|
154 |
+
## 📊 Planned Eval
|
155 |
|
156 |
+
* **Exact-Match JSON Accuracy**
|
157 |
+
* **Structural F1**
|
158 |
+
* **Valid-JSON Rate**
|
159 |
+
|
160 |
+
Benchmarks incoming—watch this space. 🛰️
|
161 |
+
|
162 |
+
---
|
163 |
+
|
164 |
+
## 🤝 Citation
|
165 |
|
166 |
```bibtex
|
167 |
@misc{bhaviktheslider_2025_unsloth_qwen2.5_3b_grpo,
|
|
|
169 |
author = {Bhaviktheslider},
|
170 |
year = {2025},
|
171 |
howpublished = {Hugging Face},
|
172 |
+
note = {\url{https://huggingface.co/bhaviktheslider/unsloth-qwen2.5-3b-grpo-json-structurer}}
|
173 |
}
|
174 |
+
```
|
175 |
+
|
176 |
+
*May your JSON always parse and your losses always converge!* 😎
|
177 |
+
|
178 |
+
```
|