bhaviktheslider commited on
Commit
3ecf53c
Β·
verified Β·
1 Parent(s): 6f17c01

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +138 -100
README.md CHANGED
@@ -23,141 +23,179 @@ language:
23
  | **Finetuned from** | `unsloth/Qwen2.5-3B-Instruct` |
24
  | **Training Framework**| [Unsloth](https://github.com/unslothai/unsloth) Γ— Hugging Face TRL |
25
 
26
- [<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="180"/>](https://github.com/unslothai/unsloth)
27
 
28
  ---
29
 
30
  ## πŸš€ What’s New?
31
- Think of this as the protein-shake sequel to **MasterControlAIML/DeepSeek-R1-Qwen2.5-1.5b-SFT-R1-JSON-Unstructured-To-Structured**β€”but now with **3 B parameters, zero SFT, and a reward-only training regime** (GRPO) backed by an LM judge + auxiliary reward functions.
32
 
33
- | Upgrade | Explanation |
34
- |---------|-------------|
35
- | **Bigger Backbone** | 1.5 B β†’ **3 B** Qwen 2.5 for deeper reasoning headroom. |
36
- | **Pure RL** | No supervised fine-tuningβ€”policy learned *entirely* from reward signals. |
37
- | **LM-as-Judge** | Separate LLM scores each candidate for correctness, JSON validity, length & style. |
38
- | **2Γ— Faster Training** | Courtesy of Unsloth’s memory-savings (flash-attention, fused ops). |
39
 
40
  ---
41
 
42
  ## πŸ› οΈ Intended Use
43
-
44
- Structured-data extraction from messy prose, logs, or transcripts.
45
- Drop-in upgrade for any pipeline currently using the older 1.5 B DeepSeek-R1 JSON-structurer.
46
 
47
  ---
48
 
49
- ## πŸ”§ How to Use
50
-
51
- Below is a minimal example that **re-uses the exact prompt format** from the previous model.
52
- The model first *reasons* silently (chain-of-thought is kept internal) and then emits only the target JSON.
53
-
54
- > **Model name** used in the snippet β†’ `bhaviktheslider/unsloth-qwen2.5-3b-grpo-json-structurer`
55
- > Replace with your actual repo path if different.
56
 
57
- ### 1️⃣ Transformers Quick-Start
 
 
58
 
59
  ```python
 
 
 
 
60
  from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
61
- import torch, json, textwrap
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
62
 
63
- MODEL = "MasterControlAIML/DeepSeek-R1-Qwen2.5-3b-LLM-Judge-Reward-JSON-Unstructured-To-Structured-Merged-Lora-16bit"
 
 
64
 
 
65
  tok = AutoTokenizer.from_pretrained(MODEL, use_fast=True)
66
  model = AutoModelForCausalLM.from_pretrained(
67
  MODEL,
68
- torch_dtype=torch.float16,
69
- device_map="auto"
70
- )
71
-
72
- # --- Prompt (identical structure to previous model) ---
73
- system_prompt = (
74
- "You are an intelligent JSON conversion engine. "
75
- "Think step-by-step, and then output the final valid JSON."
76
  )
 
 
77
 
78
- task_prompt = textwrap.dedent("""\
79
- ### Task
80
- Convert the following unstructured text into the JSON schema shown below.
81
- Return *only* valid JSON.
82
-
83
- ### Schema
84
- {
85
- "name": str,
86
- "age": int,
87
- "city": str,
88
- "skills": [str]
89
- }
90
-
91
- ### Unstructured text
92
- John Doe, a 28-year-old software engineer living in Austin, loves Python and Golang.
93
- """)
94
-
95
- generator = pipeline(
96
- "text-generation",
97
- model=model,
98
- tokenizer=tok,
99
- max_new_tokens=256,
100
- do_sample=False,
101
- )
102
 
103
- output = generator(f"<|system|>\n{system_prompt}\n<|user|>\n{task_prompt}")[0]["generated_text"]
 
 
 
 
104
 
105
- data = json.loads(output) # βœ… will raise if JSON isn’t valid
106
- print(data)
107
  ````
108
 
109
- ### 2️⃣ Text-Generation-Inference (TGI)
110
 
111
- ```bash
112
- # start server (8-bit, BF16, etc. as needed)
113
- text-generation-launcher --model-id MasterControlAIML/DeepSeek-R1-Qwen2.5-3b-LLM-Judge-Reward-JSON-Unstructured-To-Structured-Merged-Lora-16bit
114
-
115
- # curl call
116
- curl http://localhost:8080/generate \
117
- -d '{
118
- "inputs": "<|system|>\nYou are an intelligent JSON conversion engine. Think step-by-step, but ONLY output the final valid JSON.\n<|user|>\n### Task\nConvert the following unstructured text into the JSON schema shown below.\nReturn *only* valid JSON.\n\n### Schema\n{\"title\": str, \"authors\": [str], \"year\": int}\n\n### Unstructured text\n\"Deep Learning\" was written by Ian Goodfellow, Yoshua Bengio, and Aaron Courville in 2016.\n",
119
- "parameters": {"max_new_tokens": 256, "do_sample": false}
120
- }'
121
- ```
122
-
123
- The response will be a pure JSON string, e.g.:
124
-
125
- ```json
126
- {"title":"Deep Learning","authors":["Ian Goodfellow","Yoshua Bengio","Aaron Courville"],"year":2016}
127
- ```
128
-
129
- ---
130
-
131
- ## πŸ€– Why This Prompt Works
132
-
133
- 1. **System role** instructs the model to plan internally and expose *only* the JSON.
134
- 2. **Schema block** constrains the output keys & types.
135
- 3. **GRPO training** rewarded strict adherence to schema validity and penalized hallucinated keys or malformed JSON.
136
- 4. **LM-as-Judge** provides dense shaping signals: structural accuracy, content fidelity, token length, even stylistic consistency.
137
-
138
- The result: *reliable one-shot structuring without post-processing hacks*.
139
 
140
  ---
141
 
142
  ## πŸ‹οΈ Training Recipe (Condensed)
143
 
144
- | Setting | Value |
145
- | -------------------- | ------------------------------------------------------------------- |
146
- | **Algorithm** | GRPO (policy β‰ˆ LM; reward LM β‰ˆ `Qwen2.5-7B` w/ JSON validator head) |
147
- | **Effective Epochs** | 3 |
148
- | **Batching** | Accum 8, bfloat16 |
149
- | **Optimizer** | Fused AdamW |
150
- | **Throughput** | \~45 k tokens/s on 8Γ—A100 |
151
 
152
  ---
153
 
154
- ## πŸ“Š Planned Eval
155
 
156
- * **Exact-Match JSON Accuracy**
157
- * **Structural F1**
158
- * **Valid-JSON Rate**
 
 
159
 
160
- Benchmarks incomingβ€”watch this space. πŸ›°οΈ
161
 
162
  ---
163
 
@@ -165,14 +203,14 @@ Benchmarks incomingβ€”watch this space. πŸ›°οΈ
165
 
166
  ```bibtex
167
  @misc{bhaviktheslider_2025_unsloth_qwen2.5_3b_grpo,
168
- title = {An Unsloth-accelerated GRPO-trained Qwen 2.5 3B for JSON structuring},
169
  author = {Bhaviktheslider},
170
  year = {2025},
171
- howpublished = {Hugging Face},
172
- note = {\url{https://huggingface.co/bhaviktheslider/unsloth-qwen2.5-3b-grpo-json-structurer}}
173
  }
174
  ```
175
 
176
  *May your JSON always parse and your losses always converge!* 😎
177
 
178
  ```
 
 
23
  | **Finetuned from** | `unsloth/Qwen2.5-3B-Instruct` |
24
  | **Training Framework**| [Unsloth](https://github.com/unslothai/unsloth) Γ— Hugging Face TRL |
25
 
26
+ [<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="190"/>](https://github.com/unslothai/unsloth)
27
 
28
  ---
29
 
30
  ## πŸš€ What’s New?
31
+ > *The protein-shake sequel to **MasterControlAIML/DeepSeek-R1-Qwen2.5-1.5b-SFT-R1-JSON-Unstructured-To-Structured**β€”now with more neurons, zero SFT, and a league of reward functions.*
32
 
33
+ | Upgrade | Explanation |
34
+ |--------------------|------------------------------------------------------------------------------|
35
+ | **Bigger Backbone**| 1.5 B β†’ **3 B** Qwen 2.5 for bigger reasoning muscles. |
36
+ | **Pure RL** | No supervised fine-tuningβ€”policy learned *only* from reward signals (GRPO). |
37
+ | **LM-as-Judge** | Separate LLM rates each candidate for correctness, JSON validity, style… |
38
+ | **2Γ— Faster Train**| Unsloth’s flash-attention & fused ops = less VRAM, more speed. |
39
 
40
  ---
41
 
42
  ## πŸ› οΈ Intended Use
43
+ * Convert messy prose, logs, or audit notes into a pristine JSON document that follows a complex, nested schema.
44
+ * Drop-in replacement for any pipeline using the older DeepSeek-R1 1.5 B structurerβ€”just swap the checkpoint and enjoy the headroom.
 
45
 
46
  ---
47
 
48
+ ## πŸ”§ How to Use (Reasoning + JSON)
49
+ The snippet below:
 
 
 
 
 
50
 
51
+ 1. **Primes** the model with the *exact* Pydantic schema, so it outputs the right keys.
52
+ 2. Makes the model **think step-by-step** (reasoning) but still wraps the final JSON in an easy-to-parse container.
53
+ 3. Uses the correct repo name: `MasterControlAIML/DeepSeek-R1-Qwen2.5-3b-LLM-Judge-Reward-JSON-Unstructured-To-Structured-Lora`.
54
 
55
  ```python
56
+ # ─────────────────────────────────────────────────────────────────────────────
57
+ # QUICK-START
58
+ # Structured-data extraction with reasoning + JSON output
59
+ # ─────────────────────────────────────────────────────────────────────────────
60
  from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
61
+ import torch, json, textwrap, inspect
62
+ from pydantic import BaseModel
63
+ from typing import List, Optional
64
+
65
+ MODEL = "MasterControlAIML/DeepSeek-R1-Qwen2.5-3b-LLM-Judge-Reward-JSON-Unstructured-To-Structured-Lora"
66
+
67
+ # 1️⃣ Inline schema (keeps the LLM on-rails) ─────────────────────────────────
68
+ class MultipleChoice(BaseModel):
69
+ question: str
70
+ options: List[str]
71
+ selected: str
72
+
73
+ class FormField(BaseModel):
74
+ fieldName: str
75
+ value: str
76
+ notes: Optional[str] = ""
77
+
78
+ class Calculation(BaseModel):
79
+ formula: str
80
+ result: str
81
+ notes: Optional[str] = ""
82
+
83
+ class Metadata(BaseModel):
84
+ reportDate: str
85
+ auditorId: Optional[str] = None
86
+ comments: Optional[str] = None
87
+
88
+ class Content(BaseModel):
89
+ paragraphs: List[str]
90
+ tables: List["Table"] # assume Table defined elsewhere
91
+ checkboxes: List["Checkbox"] # 〃
92
+ multipleChoice: List[MultipleChoice]
93
+ formFields: List[FormField]
94
+ calculations: List[Calculation]
95
+ metadata: Optional[Metadata] = Metadata(reportDate="")
96
+
97
+ class Section(BaseModel):
98
+ id: str
99
+ title: str
100
+ content: Content
101
+
102
+ class Document(BaseModel):
103
+ documentTitle: str
104
+ documentDate: str
105
+ sections: List[Section]
106
+
107
+ SCHEMA_TEXT = inspect.getsource(Document)
108
+
109
+ # 2️⃣ Build prompts ──────────────────────────────────────────────────────────
110
+ SYSTEM_PROMPT = textwrap.dedent(f"""
111
+ You are an expert **data-extraction assistant**.
112
+ Extract structured info from unstructured text **exactly** following the Pydantic schema.
113
+
114
+ ── Schema ──
115
+ {SCHEMA_TEXT}
116
+ ─────────────
117
+
118
+ Rules:
119
+ 1. Follow the schema for keys & nesting.
120
+ 2. Copy values verbatim when possible.
121
+ 3. If a field is missing, return null.
122
+ 4. Output your step-by-step reasoning first.
123
+ 5. Then return ONLY the JSON inside this wrapper:
124
+ final answer[ json object: {{ ... }} ]
125
+
126
+ Format:
127
+ <reasoning>…</reasoning>
128
+ <answer>
129
+ final answer[ json object: {{ … }} ]
130
+ </answer>
131
+ """).strip()
132
+
133
+ UNSTRUCTURED_TEXT = """
134
+ 12 April 2025 – Onsite audit performed by Jane Smith.
135
+ Observations: Two fire extinguishers past expiry; emergency lights functional.
136
+ Calculations: Total extinguishers = 8, expired = 2 β†’ 25 % overdue.
137
+ """
138
+
139
+ USER_PROMPT = textwrap.dedent(f"""
140
+ ### Task
141
+ Convert the following *hier* text to the schema.
142
 
143
+ ### hier
144
+ {UNSTRUCTURED_TEXT}
145
+ """).strip()
146
 
147
+ # 3️⃣ Generate ───────────────────────────────────────────────────────────────
148
  tok = AutoTokenizer.from_pretrained(MODEL, use_fast=True)
149
  model = AutoModelForCausalLM.from_pretrained(
150
  MODEL,
151
+ device_map="auto",
152
+ torch_dtype=torch.bfloat16
 
 
 
 
 
 
153
  )
154
+ gen = pipeline("text-generation", model=model, tokenizer=tok,
155
+ max_new_tokens=512, do_sample=False)
156
 
157
+ prompt = f"<|system|>\n{SYSTEM_PROMPT}\n<|user|>\n{USER_PROMPT}"
158
+ raw_out = gen(prompt)[0]["generated_text"]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
159
 
160
+ # 4️⃣ Slice out the JSON ─────────────────────────────────────────────────────
161
+ start = raw_out.find("final answer[")
162
+ end = raw_out.rfind("]") + 1
163
+ json_text = raw_out[start:].split("json object:")[-1].strip(" []\n")
164
+ data = json.loads(json_text) # βœ… Raises if malformed
165
 
166
+ print(raw_out) # reasoning + JSON
167
+ print("\nβœ… Parsed object:\n", data)
168
  ````
169
 
170
+ ### Why it Works 🧐
171
 
172
+ * **Schema-priming** ensures key-level fidelityβ€”no β€œcreative” field names.
173
+ * **Chain-of-thought** improves factual extraction (was rewarded during GRPO).
174
+ * The `final answer[…]` wrapper makes downstream parsing a one-liner.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
175
 
176
  ---
177
 
178
  ## πŸ‹οΈ Training Recipe (Condensed)
179
 
180
+ | Setting | Value |
181
+ | -------------- | ------------------------------------------------------------------- |
182
+ | **Algorithm** | GRPO – policy β‰ˆ LM, reward LM β‰ˆ `Qwen2.5-7B` w/ JSON-validator head |
183
+ | **Epochs** | 3 (effective) |
184
+ | **Batch** | Grad-accum 8, bfloat16 |
185
+ | **Optimizer** | Fused AdamW |
186
+ | **Throughput** | β‰ˆ 45 k tokens/s on 8Γ—A100 |
187
 
188
  ---
189
 
190
+ ## πŸ“Š Evaluation (WIP)
191
 
192
+ | Metric | Status |
193
+ | ------------------------- | ------ |
194
+ | Exact-Match JSON Accuracy | πŸ”œ |
195
+ | Structural F1 | πŸ”œ |
196
+ | Valid-JSON Rate | πŸ”œ |
197
 
198
+ Stay tunedβ€”numbers landing faster than you can say β€œschema validation.” πŸ›°οΈ
199
 
200
  ---
201
 
 
203
 
204
  ```bibtex
205
  @misc{bhaviktheslider_2025_unsloth_qwen2.5_3b_grpo,
206
+ title = {An Unsloth-accelerated GRPO-trained Qwen 2.5-3B for JSON structuring},
207
  author = {Bhaviktheslider},
208
  year = {2025},
209
+ howpublished = {\url{https://huggingface.co/MasterControlAIML/DeepSeek-R1-Qwen2.5-3b-LLM-Judge-Reward-JSON-Unstructured-To-Structured-Lora}}
 
210
  }
211
  ```
212
 
213
  *May your JSON always parse and your losses always converge!* 😎
214
 
215
  ```
216
+