MasterControlAIML
/

DeepSeek-R1-Qwen2.5-3b-LLM-Judge-Reward-JSON-Unstructured-To-Structured-Lora

@@ -1,3 +1,4 @@
 ---
 # 🦄 Model Card
 base_model: unsloth/Qwen2.5-3B-Instruct
@@ -22,57 +23,145 @@ language:
 | **Finetuned from**    | `unsloth/Qwen2.5-3B-Instruct`              |
 | **Training Framework**| [Unsloth](https://github.com/unslothai/unsloth) × Hugging Face TRL |
-[<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)
 ---
 ## 🚀 What’s New?
-> **TL;DR** – Think of this model as the beefed-up, protein-shake-powered sequel to **MasterControlAIML/DeepSeek-R1-Qwen2.5-1.5b-SFT-R1-JSON-Unstructured-To-Structured** … except we ditched the SFT and let a squad of reward functions do the coaching.
-### Key Upgrades
-1. **Larger Backbone** – We jumped from a 1.5 B parameter model to a 3 B parameter **Qwen 2.5** variant for more representational oomph.
-2. **No SFT, All 🍬 Rewards** – Instead of supervised fine-tuning, training relied solely on reward-based optimization (GRPO).
-   - **LM-as-Judge**: A language model scored candidate outputs for task quality.
-   - **Auxiliary Rewards**: Style, length, and JSON-validity rewards kept the model on its best behavior.
-3. **2× Faster Training** – Courtesy of Unsloth’s memory-efficient tricks (flash attention + fused optimizers).
 ---
 ## 🛠️ Intended Use
-- Converts messy, free-form text into structured JSON—exactly like its 1.5 B predecessor, but with a deeper knowledge reservoir and reinforcement-tuned precision.
-- Drop-in replacement for any pipeline already using the DeepSeek-R1 model. Just swap checkpoints and enjoy the headroom.
 ---
-## 🏋️ Training Details
-| Item | Value |
-|------|-------|
-| **Base Model** | `unsloth/Qwen2.5-3B-Instruct` |
-| **Batching** | Gradient Accumulation 8, bfloat16 |
-| **Optimizer** | AdamW (fused) |
-| **Algorithm** | GRPO (policy ≈ LM; reward model ≈ separate LM judge) |
-| **Epochs** | 3 (effective) |
-| **Speed** | ~2× faster vs. vanilla PyTorch thanks to Unsloth |
 ---
-## 📊 Evaluation (Coming Soon)
-We’re benchmarking against:
-- Exact-match JSON accuracy
-- Structural F1
-- Valid-JSON rate
-…stay tuned—numbers arriving faster than you can say “schema validation.”
 ---
-## 🤝 Citation
-If you build something cool with this model, a shout-out would be lovely:
 ```bibtex
 @misc{bhaviktheslider_2025_unsloth_qwen2.5_3b_grpo,
@@ -80,5 +169,10 @@ If you build something cool with this model, a shout-out would be lovely:
   author = {Bhaviktheslider},
   year   = {2025},
   howpublished = {Hugging Face},
-  note   = {https://huggingface.co/bhaviktheslider/<repo>}
 }

 ---
 # 🦄 Model Card
 base_model: unsloth/Qwen2.5-3B-Instruct
 | **Finetuned from**    | `unsloth/Qwen2.5-3B-Instruct`              |
 | **Training Framework**| [Unsloth](https://github.com/unslothai/unsloth) × Hugging Face TRL |
+[<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="180"/>](https://github.com/unslothai/unsloth)
 ---
 ## 🚀 What’s New?
+Think of this as the protein-shake sequel to **MasterControlAIML/DeepSeek-R1-Qwen2.5-1.5b-SFT-R1-JSON-Unstructured-To-Structured**—but now with **3 B parameters, zero SFT, and a reward-only training regime** (GRPO) backed by an LM judge + auxiliary reward functions.
+| Upgrade | Explanation |
+|---------|-------------|
+| **Bigger Backbone** | 1.5 B → **3 B** Qwen 2.5 for deeper reasoning headroom. |
+| **Pure RL** | No supervised fine-tuning—policy learned *entirely* from reward signals. |
+| **LM-as-Judge** | Separate LLM scores each candidate for correctness, JSON validity, length & style. |
+| **2× Faster Training** | Courtesy of Unsloth’s memory-savings (flash-attention, fused ops). |
 ---
 ## 🛠️ Intended Use
+Structured-data extraction from messy prose, logs, or transcripts.
+Drop-in upgrade for any pipeline currently using the older 1.5 B DeepSeek-R1 JSON-structurer.
 ---
+## 🔧 How to Use
+Below is a minimal example that **re-uses the exact prompt format** from the previous model.
+The model first *reasons* silently (chain-of-thought is kept internal) and then emits only the target JSON.
+> **Model name** used in the snippet → `bhaviktheslider/unsloth-qwen2.5-3b-grpo-json-structurer`
+> Replace with your actual repo path if different.
+### 1️⃣  Transformers Quick-Start
+```python
+from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
+import torch, json, textwrap
+MODEL = "MasterControlAIML/DeepSeek-R1-Qwen2.5-3b-LLM-Judge-Reward-JSON-Unstructured-To-Structured-Merged-Lora-16bit"
+tok   = AutoTokenizer.from_pretrained(MODEL, use_fast=True)
+model = AutoModelForCausalLM.from_pretrained(
+    MODEL,
+    torch_dtype=torch.float16,
+    device_map="auto"
+)
+# --- Prompt (identical structure to previous model) ---
+system_prompt = (
+    "You are an intelligent JSON conversion engine. "
+    "Think step-by-step, and then output the final valid JSON."
+)
+task_prompt = textwrap.dedent("""\
+    ### Task
+    Convert the following unstructured text into the JSON schema shown below.
+    Return *only* valid JSON.
+    ### Schema
+    {
+      "name": str,
+      "age": int,
+      "city": str,
+      "skills": [str]
+    }
+    ### Unstructured text
+    John Doe, a 28-year-old software engineer living in Austin, loves Python and Golang.
+""")
+generator = pipeline(
+    "text-generation",
+    model=model,
+    tokenizer=tok,
+    max_new_tokens=256,
+    do_sample=False,
+)
+output = generator(f"<|system|>\n{system_prompt}\n<|user|>\n{task_prompt}")[0]["generated_text"]
+data = json.loads(output)   # ✅ will raise if JSON isn’t valid
+print(data)
+````
+### 2️⃣  Text-Generation-Inference (TGI)
+```bash
+# start server (8-bit, BF16, etc. as needed)
+text-generation-launcher --model-id MasterControlAIML/DeepSeek-R1-Qwen2.5-3b-LLM-Judge-Reward-JSON-Unstructured-To-Structured-Merged-Lora-16bit
+# curl call
+curl http://localhost:8080/generate \
+  -d '{
+        "inputs": "<|system|>\nYou are an intelligent JSON conversion engine. Think step-by-step, but ONLY output the final valid JSON.\n<|user|>\n### Task\nConvert the following unstructured text into the JSON schema shown below.\nReturn *only* valid JSON.\n\n### Schema\n{\"title\": str, \"authors\": [str], \"year\": int}\n\n### Unstructured text\n\"Deep Learning\" was written by Ian Goodfellow, Yoshua Bengio, and Aaron Courville in 2016.\n",
+        "parameters": {"max_new_tokens": 256, "do_sample": false}
+      }'
+```
+The response will be a pure JSON string, e.g.:
+```json
+{"title":"Deep Learning","authors":["Ian Goodfellow","Yoshua Bengio","Aaron Courville"],"year":2016}
+```
 ---
+## 🤖 Why This Prompt Works
+1. **System role** instructs the model to plan internally and expose *only* the JSON.
+2. **Schema block** constrains the output keys & types.
+3. **GRPO training** rewarded strict adherence to schema validity and penalized hallucinated keys or malformed JSON.
+4. **LM-as-Judge** provides dense shaping signals: structural accuracy, content fidelity, token length, even stylistic consistency.
+The result: *reliable one-shot structuring without post-processing hacks*.
 ---
+## 🏋️ Training Recipe (Condensed)
+| Setting              | Value                                                               |
+| -------------------- | ------------------------------------------------------------------- |
+| **Algorithm**        | GRPO (policy ≈ LM; reward LM ≈ `Qwen2.5-7B` w/ JSON validator head) |
+| **Effective Epochs** | 3                                                                   |
+| **Batching**         | Accum 8, bfloat16                                                   |
+| **Optimizer**        | Fused AdamW                                                         |
+| **Throughput**       | \~45 k tokens/s on 8×A100                                           |
+---
+## 📊 Planned Eval
+* **Exact-Match JSON Accuracy**
+* **Structural F1**
+* **Valid-JSON Rate**
+Benchmarks incoming—watch this space. 🛰️
+---
+## 🤝 Citation
 ```bibtex
 @misc{bhaviktheslider_2025_unsloth_qwen2.5_3b_grpo,
   author = {Bhaviktheslider},
   year   = {2025},
   howpublished = {Hugging Face},
+  note   = {\url{https://huggingface.co/bhaviktheslider/unsloth-qwen2.5-3b-grpo-json-structurer}}
 }
+```
+*May your JSON always parse and your losses always converge!* 😎
+```