MasterControlAIML
/

DeepSeek-R1-Qwen2.5-3b-LLM-Judge-Reward-JSON-Unstructured-To-Structured-Lora

@@ -23,141 +23,179 @@ language:
 | **Finetuned from**    | `unsloth/Qwen2.5-3B-Instruct`              |
 | **Training Framework**| [Unsloth](https://github.com/unslothai/unsloth) × Hugging Face TRL |
-[<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="180"/>](https://github.com/unslothai/unsloth)
 ---
 ## 🚀 What’s New?
-Think of this as the protein-shake sequel to **MasterControlAIML/DeepSeek-R1-Qwen2.5-1.5b-SFT-R1-JSON-Unstructured-To-Structured**—but now with **3 B parameters, zero SFT, and a reward-only training regime** (GRPO) backed by an LM judge + auxiliary reward functions.
-| Upgrade | Explanation |
-|---------|-------------|
-| **Bigger Backbone** | 1.5 B → **3 B** Qwen 2.5 for deeper reasoning headroom. |
-| **Pure RL** | No supervised fine-tuning—policy learned *entirely* from reward signals. |
-| **LM-as-Judge** | Separate LLM scores each candidate for correctness, JSON validity, length & style. |
-| **2× Faster Training** | Courtesy of Unsloth’s memory-savings (flash-attention, fused ops). |
 ---
 ## 🛠️ Intended Use
-Structured-data extraction from messy prose, logs, or transcripts.
-Drop-in upgrade for any pipeline currently using the older 1.5 B DeepSeek-R1 JSON-structurer.
 ---
-## 🔧 How to Use
-Below is a minimal example that **re-uses the exact prompt format** from the previous model.
-The model first *reasons* silently (chain-of-thought is kept internal) and then emits only the target JSON.
-> **Model name** used in the snippet → `bhaviktheslider/unsloth-qwen2.5-3b-grpo-json-structurer`
-> Replace with your actual repo path if different.
-### 1️⃣  Transformers Quick-Start
 ```python
 from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
-import torch, json, textwrap
-MODEL = "MasterControlAIML/DeepSeek-R1-Qwen2.5-3b-LLM-Judge-Reward-JSON-Unstructured-To-Structured-Merged-Lora-16bit"
 tok   = AutoTokenizer.from_pretrained(MODEL, use_fast=True)
 model = AutoModelForCausalLM.from_pretrained(
     MODEL,
-    torch_dtype=torch.float16,
-    device_map="auto"
-)
-# --- Prompt (identical structure to previous model) ---
-system_prompt = (
-    "You are an intelligent JSON conversion engine. "
-    "Think step-by-step, and then output the final valid JSON."
 )
-task_prompt = textwrap.dedent("""\
-    ### Task
-    Convert the following unstructured text into the JSON schema shown below.
-    Return *only* valid JSON.
-    ### Schema
-    {
-      "name": str,
-      "age": int,
-      "city": str,
-      "skills": [str]
-    }
-    ### Unstructured text
-    John Doe, a 28-year-old software engineer living in Austin, loves Python and Golang.
-""")
-generator = pipeline(
-    "text-generation",
-    model=model,
-    tokenizer=tok,
-    max_new_tokens=256,
-    do_sample=False,
-)
-output = generator(f"<|system|>\n{system_prompt}\n<|user|>\n{task_prompt}")[0]["generated_text"]
-data = json.loads(output)   # ✅ will raise if JSON isn’t valid
-print(data)
 ````
-### 2️⃣  Text-Generation-Inference (TGI)
-```bash
-# start server (8-bit, BF16, etc. as needed)
-text-generation-launcher --model-id MasterControlAIML/DeepSeek-R1-Qwen2.5-3b-LLM-Judge-Reward-JSON-Unstructured-To-Structured-Merged-Lora-16bit
-# curl call
-curl http://localhost:8080/generate \
-  -d '{
-        "inputs": "<|system|>\nYou are an intelligent JSON conversion engine. Think step-by-step, but ONLY output the final valid JSON.\n<|user|>\n### Task\nConvert the following unstructured text into the JSON schema shown below.\nReturn *only* valid JSON.\n\n### Schema\n{\"title\": str, \"authors\": [str], \"year\": int}\n\n### Unstructured text\n\"Deep Learning\" was written by Ian Goodfellow, Yoshua Bengio, and Aaron Courville in 2016.\n",
-        "parameters": {"max_new_tokens": 256, "do_sample": false}
-      }'
-```
-The response will be a pure JSON string, e.g.:
-```json
-{"title":"Deep Learning","authors":["Ian Goodfellow","Yoshua Bengio","Aaron Courville"],"year":2016}
-```
----
-## 🤖 Why This Prompt Works
-1. **System role** instructs the model to plan internally and expose *only* the JSON.
-2. **Schema block** constrains the output keys & types.
-3. **GRPO training** rewarded strict adherence to schema validity and penalized hallucinated keys or malformed JSON.
-4. **LM-as-Judge** provides dense shaping signals: structural accuracy, content fidelity, token length, even stylistic consistency.
-The result: *reliable one-shot structuring without post-processing hacks*.
 ---
 ## 🏋️ Training Recipe (Condensed)
-| Setting              | Value                                                               |
-| -------------------- | ------------------------------------------------------------------- |
-| **Algorithm**        | GRPO (policy ≈ LM; reward LM ≈ `Qwen2.5-7B` w/ JSON validator head) |
-| **Effective Epochs** | 3                                                                   |
-| **Batching**         | Accum 8, bfloat16                                                   |
-| **Optimizer**        | Fused AdamW                                                         |
-| **Throughput**       | \~45 k tokens/s on 8×A100                                           |
 ---
-## 📊 Planned Eval
-* **Exact-Match JSON Accuracy**
-* **Structural F1**
-* **Valid-JSON Rate**
-Benchmarks incoming—watch this space. 🛰️
 ---
@@ -165,14 +203,14 @@ Benchmarks incoming—watch this space. 🛰️
 ```bibtex
 @misc{bhaviktheslider_2025_unsloth_qwen2.5_3b_grpo,
-  title  = {An Unsloth-accelerated GRPO-trained Qwen 2.5 3B for JSON structuring},
   author = {Bhaviktheslider},
   year   = {2025},
-  howpublished = {Hugging Face},
-  note   = {\url{https://huggingface.co/bhaviktheslider/unsloth-qwen2.5-3b-grpo-json-structurer}}
 }
 ```
 *May your JSON always parse and your losses always converge!* 😎
 ```

 | **Finetuned from**    | `unsloth/Qwen2.5-3B-Instruct`              |
 | **Training Framework**| [Unsloth](https://github.com/unslothai/unsloth) × Hugging Face TRL |
+[<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="190"/>](https://github.com/unslothai/unsloth)
 ---
 ## 🚀 What’s New?
+> *The protein-shake sequel to **MasterControlAIML/DeepSeek-R1-Qwen2.5-1.5b-SFT-R1-JSON-Unstructured-To-Structured**—now with more neurons, zero SFT, and a league of reward functions.*
+| Upgrade            | Explanation                                                                 |
+|--------------------|------------------------------------------------------------------------------|
+| **Bigger Backbone**| 1.5 B → **3 B** Qwen 2.5 for bigger reasoning muscles.                       |
+| **Pure RL**        | No supervised fine-tuning—policy learned *only* from reward signals (GRPO).  |
+| **LM-as-Judge**    | Separate LLM rates each candidate for correctness, JSON validity, style…     |
+| **2× Faster Train**| Unsloth’s flash-attention & fused ops = less VRAM, more speed.               |
 ---
 ## 🛠️ Intended Use
+* Convert messy prose, logs, or audit notes into a pristine JSON document that follows a complex, nested schema.
+* Drop-in replacement for any pipeline using the older DeepSeek-R1 1.5 B structurer—just swap the checkpoint and enjoy the headroom.
 ---
+## 🔧 How to Use (Reasoning + JSON)
+The snippet below:
+1. **Primes** the model with the *exact* Pydantic schema, so it outputs the right keys.
+2. Makes the model **think step-by-step** (reasoning) but still wraps the final JSON in an easy-to-parse container.
+3. Uses the correct repo name: `MasterControlAIML/DeepSeek-R1-Qwen2.5-3b-LLM-Judge-Reward-JSON-Unstructured-To-Structured-Lora`.
 ```python
+# ─────────────────────────────────────────────────────────────────────────────
+# QUICK-START
+# Structured-data extraction with reasoning + JSON output
+# ─────────────────────────────────────────────────────────────────────────────
 from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
+import torch, json, textwrap, inspect
+from pydantic import BaseModel
+from typing import List, Optional
+MODEL = "MasterControlAIML/DeepSeek-R1-Qwen2.5-3b-LLM-Judge-Reward-JSON-Unstructured-To-Structured-Lora"
+# 1️⃣  Inline schema (keeps the LLM on-rails) ─────────────────────────────────
+class MultipleChoice(BaseModel):
+    question: str
+    options: List[str]
+    selected: str
+class FormField(BaseModel):
+    fieldName: str
+    value: str
+    notes: Optional[str] = ""
+class Calculation(BaseModel):
+    formula: str
+    result: str
+    notes: Optional[str] = ""
+class Metadata(BaseModel):
+    reportDate: str
+    auditorId: Optional[str] = None
+    comments: Optional[str] = None
+class Content(BaseModel):
+    paragraphs: List[str]
+    tables: List["Table"]          # assume Table defined elsewhere
+    checkboxes: List["Checkbox"]   #          〃
+    multipleChoice: List[MultipleChoice]
+    formFields: List[FormField]
+    calculations: List[Calculation]
+    metadata: Optional[Metadata] = Metadata(reportDate="")
+class Section(BaseModel):
+    id: str
+    title: str
+    content: Content
+class Document(BaseModel):
+    documentTitle: str
+    documentDate: str
+    sections: List[Section]
+SCHEMA_TEXT = inspect.getsource(Document)
+# 2️⃣  Build prompts ──────────────────────────────────────────────────────────
+SYSTEM_PROMPT = textwrap.dedent(f"""
+    You are an expert **data-extraction assistant**.
+    Extract structured info from unstructured text **exactly** following the Pydantic schema.
+    ── Schema ──
+    {SCHEMA_TEXT}
+    ─────────────
+    Rules:
+      1. Follow the schema for keys & nesting.
+      2. Copy values verbatim when possible.
+      3. If a field is missing, return null.
+      4. Output your step-by-step reasoning first.
+      5. Then return ONLY the JSON inside this wrapper:
+         final answer[ json object: {{ ... }} ]
+    Format:
+      <reasoning>…</reasoning>
+      <answer>
+      final answer[ json object: {{ … }} ]
+      </answer>
+""").strip()
+UNSTRUCTURED_TEXT = """
+    12 April 2025 – Onsite audit performed by Jane Smith.
+    Observations: Two fire extinguishers past expiry; emergency lights functional.
+    Calculations: Total extinguishers = 8, expired = 2 → 25 % overdue.
+"""
+USER_PROMPT = textwrap.dedent(f"""
+    ### Task
+    Convert the following *hier* text to the schema.
+    ### hier
+    {UNSTRUCTURED_TEXT}
+""").strip()
+# 3️⃣  Generate ───────────────────────────────────────────────────────────────
 tok   = AutoTokenizer.from_pretrained(MODEL, use_fast=True)
 model = AutoModelForCausalLM.from_pretrained(
     MODEL,
+    device_map="auto",
+    torch_dtype=torch.bfloat16
 )
+gen = pipeline("text-generation", model=model, tokenizer=tok,
+               max_new_tokens=512, do_sample=False)
+prompt = f"<|system|>\n{SYSTEM_PROMPT}\n<|user|>\n{USER_PROMPT}"
+raw_out = gen(prompt)[0]["generated_text"]
+# 4️⃣  Slice out the JSON ─────────────────────────────────────────────────────
+start = raw_out.find("final answer[")
+end   = raw_out.rfind("]") + 1
+json_text = raw_out[start:].split("json object:")[-1].strip(" []\n")
+data = json.loads(json_text)    # ✅ Raises if malformed
+print(raw_out)  # reasoning + JSON
+print("\n✅ Parsed object:\n", data)
 ````
+### Why it Works 🧐
+* **Schema-priming** ensures key-level fidelity—no “creative” field names.
+* **Chain-of-thought** improves factual extraction (was rewarded during GRPO).
+* The `final answer[…]` wrapper makes downstream parsing a one-liner.
 ---
 ## 🏋️ Training Recipe (Condensed)
+| Setting        | Value                                                               |
+| -------------- | ------------------------------------------------------------------- |
+| **Algorithm**  | GRPO – policy ≈ LM, reward LM ≈ `Qwen2.5-7B` w/ JSON-validator head |
+| **Epochs**     | 3 (effective)                                                       |
+| **Batch**      | Grad-accum 8, bfloat16                                              |
+| **Optimizer**  | Fused AdamW                                                         |
+| **Throughput** | ≈ 45 k tokens/s on 8×A100                                           |
 ---
+## 📊 Evaluation (WIP)
+| Metric                    | Status |
+| ------------------------- | ------ |
+| Exact-Match JSON Accuracy | 🔜     |
+| Structural F1             | 🔜     |
+| Valid-JSON Rate           | 🔜     |
+Stay tuned—numbers landing faster than you can say “schema validation.” 🛰️
 ---
 ```bibtex
 @misc{bhaviktheslider_2025_unsloth_qwen2.5_3b_grpo,
+  title  = {An Unsloth-accelerated GRPO-trained Qwen 2.5-3B for JSON structuring},
   author = {Bhaviktheslider},
   year   = {2025},
+  howpublished = {\url{https://huggingface.co/MasterControlAIML/DeepSeek-R1-Qwen2.5-3b-LLM-Judge-Reward-JSON-Unstructured-To-Structured-Lora}}
 }
 ```
 *May your JSON always parse and your losses always converge!* 😎
 ```