Update README.md
Browse files
README.md
CHANGED
@@ -23,141 +23,179 @@ language:
|
|
23 |
| **Finetuned from** | `unsloth/Qwen2.5-3B-Instruct` |
|
24 |
| **Training Framework**| [Unsloth](https://github.com/unslothai/unsloth) Γ Hugging Face TRL |
|
25 |
|
26 |
-
[<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="
|
27 |
|
28 |
---
|
29 |
|
30 |
## π Whatβs New?
|
31 |
-
|
32 |
|
33 |
-
| Upgrade
|
34 |
-
|
35 |
-
| **Bigger Backbone
|
36 |
-
| **Pure RL**
|
37 |
-
| **LM-as-Judge**
|
38 |
-
| **2Γ Faster
|
39 |
|
40 |
---
|
41 |
|
42 |
## π οΈ Intended Use
|
43 |
-
|
44 |
-
|
45 |
-
Drop-in upgrade for any pipeline currently using the older 1.5 B DeepSeek-R1 JSON-structurer.
|
46 |
|
47 |
---
|
48 |
|
49 |
-
## π§ How to Use
|
50 |
-
|
51 |
-
Below is a minimal example that **re-uses the exact prompt format** from the previous model.
|
52 |
-
The model first *reasons* silently (chain-of-thought is kept internal) and then emits only the target JSON.
|
53 |
-
|
54 |
-
> **Model name** used in the snippet β `bhaviktheslider/unsloth-qwen2.5-3b-grpo-json-structurer`
|
55 |
-
> Replace with your actual repo path if different.
|
56 |
|
57 |
-
|
|
|
|
|
58 |
|
59 |
```python
|
|
|
|
|
|
|
|
|
60 |
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
|
61 |
-
import torch, json, textwrap
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
62 |
|
63 |
-
|
|
|
|
|
64 |
|
|
|
65 |
tok = AutoTokenizer.from_pretrained(MODEL, use_fast=True)
|
66 |
model = AutoModelForCausalLM.from_pretrained(
|
67 |
MODEL,
|
68 |
-
|
69 |
-
|
70 |
-
)
|
71 |
-
|
72 |
-
# --- Prompt (identical structure to previous model) ---
|
73 |
-
system_prompt = (
|
74 |
-
"You are an intelligent JSON conversion engine. "
|
75 |
-
"Think step-by-step, and then output the final valid JSON."
|
76 |
)
|
|
|
|
|
77 |
|
78 |
-
|
79 |
-
|
80 |
-
Convert the following unstructured text into the JSON schema shown below.
|
81 |
-
Return *only* valid JSON.
|
82 |
-
|
83 |
-
### Schema
|
84 |
-
{
|
85 |
-
"name": str,
|
86 |
-
"age": int,
|
87 |
-
"city": str,
|
88 |
-
"skills": [str]
|
89 |
-
}
|
90 |
-
|
91 |
-
### Unstructured text
|
92 |
-
John Doe, a 28-year-old software engineer living in Austin, loves Python and Golang.
|
93 |
-
""")
|
94 |
-
|
95 |
-
generator = pipeline(
|
96 |
-
"text-generation",
|
97 |
-
model=model,
|
98 |
-
tokenizer=tok,
|
99 |
-
max_new_tokens=256,
|
100 |
-
do_sample=False,
|
101 |
-
)
|
102 |
|
103 |
-
|
|
|
|
|
|
|
|
|
104 |
|
105 |
-
|
106 |
-
print(data)
|
107 |
````
|
108 |
|
109 |
-
###
|
110 |
|
111 |
-
|
112 |
-
|
113 |
-
|
114 |
-
|
115 |
-
# curl call
|
116 |
-
curl http://localhost:8080/generate \
|
117 |
-
-d '{
|
118 |
-
"inputs": "<|system|>\nYou are an intelligent JSON conversion engine. Think step-by-step, but ONLY output the final valid JSON.\n<|user|>\n### Task\nConvert the following unstructured text into the JSON schema shown below.\nReturn *only* valid JSON.\n\n### Schema\n{\"title\": str, \"authors\": [str], \"year\": int}\n\n### Unstructured text\n\"Deep Learning\" was written by Ian Goodfellow, Yoshua Bengio, and Aaron Courville in 2016.\n",
|
119 |
-
"parameters": {"max_new_tokens": 256, "do_sample": false}
|
120 |
-
}'
|
121 |
-
```
|
122 |
-
|
123 |
-
The response will be a pure JSON string, e.g.:
|
124 |
-
|
125 |
-
```json
|
126 |
-
{"title":"Deep Learning","authors":["Ian Goodfellow","Yoshua Bengio","Aaron Courville"],"year":2016}
|
127 |
-
```
|
128 |
-
|
129 |
-
---
|
130 |
-
|
131 |
-
## π€ Why This Prompt Works
|
132 |
-
|
133 |
-
1. **System role** instructs the model to plan internally and expose *only* the JSON.
|
134 |
-
2. **Schema block** constrains the output keys & types.
|
135 |
-
3. **GRPO training** rewarded strict adherence to schema validity and penalized hallucinated keys or malformed JSON.
|
136 |
-
4. **LM-as-Judge** provides dense shaping signals: structural accuracy, content fidelity, token length, even stylistic consistency.
|
137 |
-
|
138 |
-
The result: *reliable one-shot structuring without post-processing hacks*.
|
139 |
|
140 |
---
|
141 |
|
142 |
## ποΈ Training Recipe (Condensed)
|
143 |
|
144 |
-
| Setting
|
145 |
-
|
|
146 |
-
| **Algorithm**
|
147 |
-
| **
|
148 |
-
| **
|
149 |
-
| **Optimizer**
|
150 |
-
| **Throughput**
|
151 |
|
152 |
---
|
153 |
|
154 |
-
## π
|
155 |
|
156 |
-
|
157 |
-
|
158 |
-
|
|
|
|
|
159 |
|
160 |
-
|
161 |
|
162 |
---
|
163 |
|
@@ -165,14 +203,14 @@ Benchmarks incomingβwatch this space. π°οΈ
|
|
165 |
|
166 |
```bibtex
|
167 |
@misc{bhaviktheslider_2025_unsloth_qwen2.5_3b_grpo,
|
168 |
-
title = {An Unsloth-accelerated GRPO-trained Qwen 2.5
|
169 |
author = {Bhaviktheslider},
|
170 |
year = {2025},
|
171 |
-
howpublished = {
|
172 |
-
note = {\url{https://huggingface.co/bhaviktheslider/unsloth-qwen2.5-3b-grpo-json-structurer}}
|
173 |
}
|
174 |
```
|
175 |
|
176 |
*May your JSON always parse and your losses always converge!* π
|
177 |
|
178 |
```
|
|
|
|
23 |
| **Finetuned from** | `unsloth/Qwen2.5-3B-Instruct` |
|
24 |
| **Training Framework**| [Unsloth](https://github.com/unslothai/unsloth) Γ Hugging Face TRL |
|
25 |
|
26 |
+
[<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="190"/>](https://github.com/unslothai/unsloth)
|
27 |
|
28 |
---
|
29 |
|
30 |
## π Whatβs New?
|
31 |
+
> *The protein-shake sequel to **MasterControlAIML/DeepSeek-R1-Qwen2.5-1.5b-SFT-R1-JSON-Unstructured-To-Structured**βnow with more neurons, zero SFT, and a league of reward functions.*
|
32 |
|
33 |
+
| Upgrade | Explanation |
|
34 |
+
|--------------------|------------------------------------------------------------------------------|
|
35 |
+
| **Bigger Backbone**| 1.5 B β **3 B** Qwen 2.5 for bigger reasoning muscles. |
|
36 |
+
| **Pure RL** | No supervised fine-tuningβpolicy learned *only* from reward signals (GRPO). |
|
37 |
+
| **LM-as-Judge** | Separate LLM rates each candidate for correctness, JSON validity, style⦠|
|
38 |
+
| **2Γ Faster Train**| Unslothβs flash-attention & fused ops = less VRAM, more speed. |
|
39 |
|
40 |
---
|
41 |
|
42 |
## π οΈ Intended Use
|
43 |
+
* Convert messy prose, logs, or audit notes into a pristine JSON document that follows a complex, nested schema.
|
44 |
+
* Drop-in replacement for any pipeline using the older DeepSeek-R1 1.5 B structurerβjust swap the checkpoint and enjoy the headroom.
|
|
|
45 |
|
46 |
---
|
47 |
|
48 |
+
## π§ How to Use (Reasoning + JSON)
|
49 |
+
The snippet below:
|
|
|
|
|
|
|
|
|
|
|
50 |
|
51 |
+
1. **Primes** the model with the *exact* Pydantic schema, so it outputs the right keys.
|
52 |
+
2. Makes the model **think step-by-step** (reasoning) but still wraps the final JSON in an easy-to-parse container.
|
53 |
+
3. Uses the correct repo name: `MasterControlAIML/DeepSeek-R1-Qwen2.5-3b-LLM-Judge-Reward-JSON-Unstructured-To-Structured-Lora`.
|
54 |
|
55 |
```python
|
56 |
+
# βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
57 |
+
# QUICK-START
|
58 |
+
# Structured-data extraction with reasoning + JSON output
|
59 |
+
# βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
60 |
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
|
61 |
+
import torch, json, textwrap, inspect
|
62 |
+
from pydantic import BaseModel
|
63 |
+
from typing import List, Optional
|
64 |
+
|
65 |
+
MODEL = "MasterControlAIML/DeepSeek-R1-Qwen2.5-3b-LLM-Judge-Reward-JSON-Unstructured-To-Structured-Lora"
|
66 |
+
|
67 |
+
# 1οΈβ£ Inline schema (keeps the LLM on-rails) βββββββββββββββββββββββββββββββββ
|
68 |
+
class MultipleChoice(BaseModel):
|
69 |
+
question: str
|
70 |
+
options: List[str]
|
71 |
+
selected: str
|
72 |
+
|
73 |
+
class FormField(BaseModel):
|
74 |
+
fieldName: str
|
75 |
+
value: str
|
76 |
+
notes: Optional[str] = ""
|
77 |
+
|
78 |
+
class Calculation(BaseModel):
|
79 |
+
formula: str
|
80 |
+
result: str
|
81 |
+
notes: Optional[str] = ""
|
82 |
+
|
83 |
+
class Metadata(BaseModel):
|
84 |
+
reportDate: str
|
85 |
+
auditorId: Optional[str] = None
|
86 |
+
comments: Optional[str] = None
|
87 |
+
|
88 |
+
class Content(BaseModel):
|
89 |
+
paragraphs: List[str]
|
90 |
+
tables: List["Table"] # assume Table defined elsewhere
|
91 |
+
checkboxes: List["Checkbox"] # γ
|
92 |
+
multipleChoice: List[MultipleChoice]
|
93 |
+
formFields: List[FormField]
|
94 |
+
calculations: List[Calculation]
|
95 |
+
metadata: Optional[Metadata] = Metadata(reportDate="")
|
96 |
+
|
97 |
+
class Section(BaseModel):
|
98 |
+
id: str
|
99 |
+
title: str
|
100 |
+
content: Content
|
101 |
+
|
102 |
+
class Document(BaseModel):
|
103 |
+
documentTitle: str
|
104 |
+
documentDate: str
|
105 |
+
sections: List[Section]
|
106 |
+
|
107 |
+
SCHEMA_TEXT = inspect.getsource(Document)
|
108 |
+
|
109 |
+
# 2οΈβ£ Build prompts ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
110 |
+
SYSTEM_PROMPT = textwrap.dedent(f"""
|
111 |
+
You are an expert **data-extraction assistant**.
|
112 |
+
Extract structured info from unstructured text **exactly** following the Pydantic schema.
|
113 |
+
|
114 |
+
ββ Schema ββ
|
115 |
+
{SCHEMA_TEXT}
|
116 |
+
βββββββββββββ
|
117 |
+
|
118 |
+
Rules:
|
119 |
+
1. Follow the schema for keys & nesting.
|
120 |
+
2. Copy values verbatim when possible.
|
121 |
+
3. If a field is missing, return null.
|
122 |
+
4. Output your step-by-step reasoning first.
|
123 |
+
5. Then return ONLY the JSON inside this wrapper:
|
124 |
+
final answer[ json object: {{ ... }} ]
|
125 |
+
|
126 |
+
Format:
|
127 |
+
<reasoning>β¦</reasoning>
|
128 |
+
<answer>
|
129 |
+
final answer[ json object: {{ β¦ }} ]
|
130 |
+
</answer>
|
131 |
+
""").strip()
|
132 |
+
|
133 |
+
UNSTRUCTURED_TEXT = """
|
134 |
+
12 April 2025 β Onsite audit performed by Jane Smith.
|
135 |
+
Observations: Two fire extinguishers past expiry; emergency lights functional.
|
136 |
+
Calculations: Total extinguishers = 8, expired = 2 β 25 % overdue.
|
137 |
+
"""
|
138 |
+
|
139 |
+
USER_PROMPT = textwrap.dedent(f"""
|
140 |
+
### Task
|
141 |
+
Convert the following *hier* text to the schema.
|
142 |
|
143 |
+
### hier
|
144 |
+
{UNSTRUCTURED_TEXT}
|
145 |
+
""").strip()
|
146 |
|
147 |
+
# 3οΈβ£ Generate βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
148 |
tok = AutoTokenizer.from_pretrained(MODEL, use_fast=True)
|
149 |
model = AutoModelForCausalLM.from_pretrained(
|
150 |
MODEL,
|
151 |
+
device_map="auto",
|
152 |
+
torch_dtype=torch.bfloat16
|
|
|
|
|
|
|
|
|
|
|
|
|
153 |
)
|
154 |
+
gen = pipeline("text-generation", model=model, tokenizer=tok,
|
155 |
+
max_new_tokens=512, do_sample=False)
|
156 |
|
157 |
+
prompt = f"<|system|>\n{SYSTEM_PROMPT}\n<|user|>\n{USER_PROMPT}"
|
158 |
+
raw_out = gen(prompt)[0]["generated_text"]
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
159 |
|
160 |
+
# 4οΈβ£ Slice out the JSON βββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
161 |
+
start = raw_out.find("final answer[")
|
162 |
+
end = raw_out.rfind("]") + 1
|
163 |
+
json_text = raw_out[start:].split("json object:")[-1].strip(" []\n")
|
164 |
+
data = json.loads(json_text) # β
Raises if malformed
|
165 |
|
166 |
+
print(raw_out) # reasoning + JSON
|
167 |
+
print("\nβ
Parsed object:\n", data)
|
168 |
````
|
169 |
|
170 |
+
### Why it Works π§
|
171 |
|
172 |
+
* **Schema-priming** ensures key-level fidelityβno βcreativeβ field names.
|
173 |
+
* **Chain-of-thought** improves factual extraction (was rewarded during GRPO).
|
174 |
+
* The `final answer[β¦]` wrapper makes downstream parsing a one-liner.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
175 |
|
176 |
---
|
177 |
|
178 |
## ποΈ Training Recipe (Condensed)
|
179 |
|
180 |
+
| Setting | Value |
|
181 |
+
| -------------- | ------------------------------------------------------------------- |
|
182 |
+
| **Algorithm** | GRPO β policy β LM, reward LM β `Qwen2.5-7B` w/ JSON-validator head |
|
183 |
+
| **Epochs** | 3 (effective) |
|
184 |
+
| **Batch** | Grad-accum 8, bfloat16 |
|
185 |
+
| **Optimizer** | Fused AdamW |
|
186 |
+
| **Throughput** | β 45 k tokens/s on 8ΓA100 |
|
187 |
|
188 |
---
|
189 |
|
190 |
+
## π Evaluation (WIP)
|
191 |
|
192 |
+
| Metric | Status |
|
193 |
+
| ------------------------- | ------ |
|
194 |
+
| Exact-Match JSON Accuracy | π |
|
195 |
+
| Structural F1 | π |
|
196 |
+
| Valid-JSON Rate | π |
|
197 |
|
198 |
+
Stay tunedβnumbers landing faster than you can say βschema validation.β π°οΈ
|
199 |
|
200 |
---
|
201 |
|
|
|
203 |
|
204 |
```bibtex
|
205 |
@misc{bhaviktheslider_2025_unsloth_qwen2.5_3b_grpo,
|
206 |
+
title = {An Unsloth-accelerated GRPO-trained Qwen 2.5-3B for JSON structuring},
|
207 |
author = {Bhaviktheslider},
|
208 |
year = {2025},
|
209 |
+
howpublished = {\url{https://huggingface.co/MasterControlAIML/DeepSeek-R1-Qwen2.5-3b-LLM-Judge-Reward-JSON-Unstructured-To-Structured-Lora}}
|
|
|
210 |
}
|
211 |
```
|
212 |
|
213 |
*May your JSON always parse and your losses always converge!* π
|
214 |
|
215 |
```
|
216 |
+
|