--- # πŸ¦„ Model Card base_model: unsloth/Qwen2.5-3B-Instruct tags: - text-generation-inference - transformers - unsloth - qwen2 - trl - grpo # Gradient Reward Policy Optimization license: apache-2.0 language: - en --- # πŸ“¦ Uploaded Model | **Field** | **Value** | |-----------------------|--------------------------------------------| | **Developed by** | **MasterControlAIML** | | **License** | Apache 2.0 | | **Finetuned from** | `unsloth/Qwen2.5-3B-Instruct` | | **Training Framework**| [Unsloth](https://github.com/unslothai/unsloth) Γ— Hugging Face TRL | [](https://github.com/unslothai/unsloth) --- ## πŸš€ What’s New? > *The protein-shake sequel to **MasterControlAIML/DeepSeek-R1-Qwen2.5-1.5b-SFT-R1-JSON-Unstructured-To-Structured**β€”now with more neurons, zero SFT, and a league of reward functions.* | Upgrade | Explanation | |--------------------|------------------------------------------------------------------------------| | **Bigger Backbone**| 1.5 B β†’ **3 B** Qwen 2.5 for bigger reasoning muscles. | | **Pure RL** | No supervised fine-tuningβ€”policy learned *only* from reward signals (GRPO). | | **LM-as-Judge** | Separate LLM rates each candidate for correctness, JSON validity, style… | | **2Γ— Faster Train**| Unsloth’s flash-attention & fused ops = less VRAM, more speed. | --- ## πŸ› οΈ Intended Use * Convert messy prose, logs, or audit notes into a pristine JSON document that follows a complex, nested schema. * Drop-in replacement for any pipeline using the older DeepSeek-R1 1.5 B structurerβ€”just swap the checkpoint and enjoy the headroom. --- ## πŸ”§ How to Use (Reasoning + JSON) The snippet below: 1. **Primes** the model with the *exact* Pydantic schema, so it outputs the right keys. 2. Makes the model **think step-by-step** (reasoning) but still wraps the final JSON in an easy-to-parse container. 3. Uses the correct repo name: `MasterControlAIML/DeepSeek-R1-Qwen2.5-3b-LLM-Judge-Reward-JSON-Unstructured-To-Structured-Lora`. ```python # ───────────────────────────────────────────────────────────────────────────── # QUICK-START # Structured-data extraction with reasoning + JSON output # ───────────────────────────────────────────────────────────────────────────── from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline import torch, json, textwrap, inspect from pydantic import BaseModel from typing import List, Optional MODEL = "MasterControlAIML/DeepSeek-R1-Qwen2.5-3b-LLM-Judge-Reward-JSON-Unstructured-To-Structured-Lora" # 1️⃣ Inline schema (keeps the LLM on-rails) ───────────────────────────────── class MultipleChoice(BaseModel): question: str options: List[str] selected: str class FormField(BaseModel): fieldName: str value: str notes: Optional[str] = "" class Calculation(BaseModel): formula: str result: str notes: Optional[str] = "" class Metadata(BaseModel): reportDate: str auditorId: Optional[str] = None comments: Optional[str] = None class Content(BaseModel): paragraphs: List[str] tables: List["Table"] # assume Table defined elsewhere checkboxes: List["Checkbox"] # 〃 multipleChoice: List[MultipleChoice] formFields: List[FormField] calculations: List[Calculation] metadata: Optional[Metadata] = Metadata(reportDate="") class Section(BaseModel): id: str title: str content: Content class Document(BaseModel): documentTitle: str documentDate: str sections: List[Section] SCHEMA_TEXT = inspect.getsource(Document) # 2️⃣ Build prompts ────────────────────────────────────────────────────────── SYSTEM_PROMPT = textwrap.dedent(f""" You are an expert **data-extraction assistant**. Extract structured info from unstructured text **exactly** following the Pydantic schema. ── Schema ── {SCHEMA_TEXT} ───────────── Rules: 1. Follow the schema for keys & nesting. 2. Copy values verbatim when possible. 3. If a field is missing, return null. 4. Output your step-by-step reasoning first. 5. Then return ONLY the JSON inside this wrapper: final answer[ json object: {{ ... }} ] Format: … final answer[ json object: {{ … }} ] """).strip() UNSTRUCTURED_TEXT = """ 12 April 2025 – Onsite audit performed by Jane Smith. Observations: Two fire extinguishers past expiry; emergency lights functional. Calculations: Total extinguishers = 8, expired = 2 β†’ 25 % overdue. """ USER_PROMPT = textwrap.dedent(f""" ### Task Convert the following *hier* text to the schema. ### hier {UNSTRUCTURED_TEXT} """).strip() # 3️⃣ Generate ─────────────────────────────────────────────────────────────── tok = AutoTokenizer.from_pretrained(MODEL, use_fast=True) model = AutoModelForCausalLM.from_pretrained( MODEL, device_map="auto", torch_dtype=torch.bfloat16 ) gen = pipeline("text-generation", model=model, tokenizer=tok, max_new_tokens=512, do_sample=False) prompt = f"<|system|>\n{SYSTEM_PROMPT}\n<|user|>\n{USER_PROMPT}" raw_out = gen(prompt)[0]["generated_text"] # 4️⃣ Slice out the JSON ───────────────────────────────────────────────────── start = raw_out.find("final answer[") end = raw_out.rfind("]") + 1 json_text = raw_out[start:].split("json object:")[-1].strip(" []\n") data = json.loads(json_text) # βœ… Raises if malformed print(raw_out) # reasoning + JSON print("\nβœ… Parsed object:\n", data) ```` ### Why it Works 🧐 * **Schema-priming** ensures key-level fidelityβ€”no β€œcreative” field names. * **Chain-of-thought** improves factual extraction (was rewarded during GRPO). * The `final answer[…]` wrapper makes downstream parsing a one-liner. --- ## πŸ‹οΈ Training Recipe (Condensed) | Setting | Value | | -------------- | ------------------------------------------------------------------- | | **Algorithm** | GRPO – policy β‰ˆ LM, reward LM β‰ˆ `Qwen2.5-7B` w/ JSON-validator head | | **Epochs** | 3 (effective) | | **Batch** | Grad-accum 8, bfloat16 | | **Optimizer** | Fused AdamW | | **Throughput** | β‰ˆ 45 k tokens/s on 8Γ—A100 | --- ## πŸ“Š Evaluation (WIP) | Metric | Status | | ------------------------- | ------ | | Exact-Match JSON Accuracy | πŸ”œ | | Structural F1 | πŸ”œ | | Valid-JSON Rate | πŸ”œ | Stay tunedβ€”numbers landing faster than you can say β€œschema validation.” πŸ›°οΈ --- ## 🀝 Citation ```bibtex @misc{bhaviktheslider_2025_unsloth_qwen2.5_3b_grpo, title = {An Unsloth-accelerated GRPO-trained Qwen 2.5-3B for JSON structuring}, author = {MasterControlAIML}, year = {2025}, howpublished = {\url{https://huggingface.co/MasterControlAIML/DeepSeek-R1-Qwen2.5-3b-LLM-Judge-Reward-JSON-Unstructured-To-Structured-Lora}} } ``` *May your JSON always parse and your losses always converge!* 😎 ```