---
# 🦄 Model Card
base_model: unsloth/Qwen2.5-3B-Instruct
tags:
- text-generation-inference
- transformers
- unsloth
- qwen2
- trl
- grpo   # Gradient Reward Policy Optimization
license: apache-2.0
language:
- en
---

# 📦 Uploaded Model  

| **Field**             | **Value**                                  |
|-----------------------|--------------------------------------------|
| **Developed by**      | **MasterControlAIML**                        |
| **License**           | Apache 2.0                                 |
| **Finetuned from**    | `unsloth/Qwen2.5-3B-Instruct`              |
| **Training Framework**| [Unsloth](https://github.com/unslothai/unsloth) × Hugging Face TRL |

[<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="190"/>](https://github.com/unslothai/unsloth)

---

## 🚀 What’s New?
> *The protein-shake sequel to **MasterControlAIML/DeepSeek-R1-Qwen2.5-1.5b-SFT-R1-JSON-Unstructured-To-Structured**—now with more neurons, zero SFT, and a league of reward functions.*

| Upgrade            | Explanation                                                                 |
|--------------------|------------------------------------------------------------------------------|
| **Bigger Backbone**| 1.5 B → **3 B** Qwen 2.5 for bigger reasoning muscles.                       |
| **Pure RL**        | No supervised fine-tuning—policy learned *only* from reward signals (GRPO).  |
| **LM-as-Judge**    | Separate LLM rates each candidate for correctness, JSON validity, style…     |
| **2× Faster Train**| Unsloth’s flash-attention & fused ops = less VRAM, more speed.               |

---

## 🛠️ Intended Use
* Convert messy prose, logs, or audit notes into a pristine JSON document that follows a complex, nested schema.  
* Drop-in replacement for any pipeline using the older DeepSeek-R1 1.5 B structurer—just swap the checkpoint and enjoy the headroom.

---

## 🔧 How to Use (Reasoning + JSON)
The snippet below:

1. **Primes** the model with the *exact* Pydantic schema, so it outputs the right keys.
2. Makes the model **think step-by-step** (reasoning) but still wraps the final JSON in an easy-to-parse container.  
3. Uses the correct repo name: `MasterControlAIML/DeepSeek-R1-Qwen2.5-3b-LLM-Judge-Reward-JSON-Unstructured-To-Structured-Lora`.

```python
# ─────────────────────────────────────────────────────────────────────────────
# QUICK-START
# Structured-data extraction with reasoning + JSON output
# ─────────────────────────────────────────────────────────────────────────────
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
import torch, json, textwrap, inspect
from pydantic import BaseModel
from typing import List, Optional

MODEL = "MasterControlAIML/DeepSeek-R1-Qwen2.5-3b-LLM-Judge-Reward-JSON-Unstructured-To-Structured-Lora"

# 1️⃣  Inline schema (keeps the LLM on-rails) ─────────────────────────────────
class MultipleChoice(BaseModel):
    question: str
    options: List[str]
    selected: str

class FormField(BaseModel):
    fieldName: str
    value: str
    notes: Optional[str] = ""

class Calculation(BaseModel):
    formula: str
    result: str
    notes: Optional[str] = ""

class Metadata(BaseModel):
    reportDate: str
    auditorId: Optional[str] = None
    comments: Optional[str] = None

class Content(BaseModel):
    paragraphs: List[str]
    tables: List["Table"]          # assume Table defined elsewhere
    checkboxes: List["Checkbox"]   #          〃
    multipleChoice: List[MultipleChoice]
    formFields: List[FormField]
    calculations: List[Calculation]
    metadata: Optional[Metadata] = Metadata(reportDate="")

class Section(BaseModel):
    id: str
    title: str
    content: Content

class Document(BaseModel):
    documentTitle: str
    documentDate: str
    sections: List[Section]

SCHEMA_TEXT = inspect.getsource(Document)

# 2️⃣  Build prompts ──────────────────────────────────────────────────────────
SYSTEM_PROMPT = textwrap.dedent(f"""
    You are an expert **data-extraction assistant**.
    Extract structured info from unstructured text **exactly** following the Pydantic schema.

    ── Schema ──
    {SCHEMA_TEXT}
    ─────────────

    Rules:
      1. Follow the schema for keys & nesting.
      2. Copy values verbatim when possible.
      3. If a field is missing, return null.
      4. Output your step-by-step reasoning first.
      5. Then return ONLY the JSON inside this wrapper:
         final answer[ json object: {{ ... }} ]

    Format:
      <reasoning>…</reasoning>
      <answer>
      final answer[ json object: {{ … }} ]
      </answer>
""").strip()

UNSTRUCTURED_TEXT = """
    12 April 2025 – Onsite audit performed by Jane Smith.
    Observations: Two fire extinguishers past expiry; emergency lights functional.
    Calculations: Total extinguishers = 8, expired = 2 → 25 % overdue.
"""

USER_PROMPT = textwrap.dedent(f"""
    ### Task
    Convert the following *hier* text to the schema.

    ### hier
    {UNSTRUCTURED_TEXT}
""").strip()

# 3️⃣  Generate ───────────────────────────────────────────────────────────────
tok   = AutoTokenizer.from_pretrained(MODEL, use_fast=True)
model = AutoModelForCausalLM.from_pretrained(
    MODEL,
    device_map="auto",
    torch_dtype=torch.bfloat16
)
gen = pipeline("text-generation", model=model, tokenizer=tok,
               max_new_tokens=512, do_sample=False)

prompt = f"<|system|>\n{SYSTEM_PROMPT}\n<|user|>\n{USER_PROMPT}"
raw_out = gen(prompt)[0]["generated_text"]

# 4️⃣  Slice out the JSON ─────────────────────────────────────────────────────
start = raw_out.find("final answer[")
end   = raw_out.rfind("]") + 1
json_text = raw_out[start:].split("json object:")[-1].strip(" []\n")
data = json.loads(json_text)    # ✅ Raises if malformed

print(raw_out)  # reasoning + JSON
print("\n✅ Parsed object:\n", data)
````

### Why it Works 🧐

* **Schema-priming** ensures key-level fidelity—no “creative” field names.
* **Chain-of-thought** improves factual extraction (was rewarded during GRPO).
* The `final answer[…]` wrapper makes downstream parsing a one-liner.

---

## 🏋️ Training Recipe (Condensed)

| Setting        | Value                                                               |
| -------------- | ------------------------------------------------------------------- |
| **Algorithm**  | GRPO – policy ≈ LM, reward LM ≈ `Qwen2.5-7B` w/ JSON-validator head |
| **Epochs**     | 3 (effective)                                                       |
| **Batch**      | Grad-accum 8, bfloat16                                              |
| **Optimizer**  | Fused AdamW                                                         |
| **Throughput** | ≈ 45 k tokens/s on 8×A100                                           |

---

## 📊 Evaluation (WIP)

| Metric                    | Status |
| ------------------------- | ------ |
| Exact-Match JSON Accuracy | 🔜     |
| Structural F1             | 🔜     |
| Valid-JSON Rate           | 🔜     |

Stay tuned—numbers landing faster than you can say “schema validation.” 🛰️

---

## 🤝 Citation

```bibtex
@misc{bhaviktheslider_2025_unsloth_qwen2.5_3b_grpo,
  title  = {An Unsloth-accelerated GRPO-trained Qwen 2.5-3B for JSON structuring},
  author = {MasterControlAIML},
  year   = {2025},
  howpublished = {\url{https://huggingface.co/MasterControlAIML/DeepSeek-R1-Qwen2.5-3b-LLM-Judge-Reward-JSON-Unstructured-To-Structured-Lora}}
}
```

*May your JSON always parse and your losses always converge!* 😎

```