bhaviktheslider's picture
Update README.md
8faf870 verified
---
# πŸ¦„ Model Card
base_model: unsloth/Qwen2.5-3B-Instruct
tags:
- text-generation-inference
- transformers
- unsloth
- qwen2
- trl
- grpo # Gradient Reward Policy Optimization
license: apache-2.0
language:
- en
---
# πŸ“¦ Uploaded Model
| **Field** | **Value** |
|-----------------------|--------------------------------------------|
| **Developed by** | **MasterControlAIML** |
| **License** | Apache 2.0 |
| **Finetuned from** | `unsloth/Qwen2.5-3B-Instruct` |
| **Training Framework**| [Unsloth](https://github.com/unslothai/unsloth) Γ— Hugging Face TRL |
[<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="190"/>](https://github.com/unslothai/unsloth)
---
## πŸš€ What’s New?
> *The protein-shake sequel to **MasterControlAIML/DeepSeek-R1-Qwen2.5-1.5b-SFT-R1-JSON-Unstructured-To-Structured**β€”now with more neurons, zero SFT, and a league of reward functions.*
| Upgrade | Explanation |
|--------------------|------------------------------------------------------------------------------|
| **Bigger Backbone**| 1.5 B β†’ **3 B** Qwen 2.5 for bigger reasoning muscles. |
| **Pure RL** | No supervised fine-tuningβ€”policy learned *only* from reward signals (GRPO). |
| **LM-as-Judge** | Separate LLM rates each candidate for correctness, JSON validity, style… |
| **2Γ— Faster Train**| Unsloth’s flash-attention & fused ops = less VRAM, more speed. |
---
## πŸ› οΈ Intended Use
* Convert messy prose, logs, or audit notes into a pristine JSON document that follows a complex, nested schema.
* Drop-in replacement for any pipeline using the older DeepSeek-R1 1.5 B structurerβ€”just swap the checkpoint and enjoy the headroom.
---
## πŸ”§ How to Use (Reasoning + JSON)
The snippet below:
1. **Primes** the model with the *exact* Pydantic schema, so it outputs the right keys.
2. Makes the model **think step-by-step** (reasoning) but still wraps the final JSON in an easy-to-parse container.
3. Uses the correct repo name: `MasterControlAIML/DeepSeek-R1-Qwen2.5-3b-LLM-Judge-Reward-JSON-Unstructured-To-Structured-Lora`.
```python
# ─────────────────────────────────────────────────────────────────────────────
# QUICK-START
# Structured-data extraction with reasoning + JSON output
# ─────────────────────────────────────────────────────────────────────────────
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
import torch, json, textwrap, inspect
from pydantic import BaseModel
from typing import List, Optional
MODEL = "MasterControlAIML/DeepSeek-R1-Qwen2.5-3b-LLM-Judge-Reward-JSON-Unstructured-To-Structured-Lora"
# 1️⃣ Inline schema (keeps the LLM on-rails) ─────────────────────────────────
class MultipleChoice(BaseModel):
question: str
options: List[str]
selected: str
class FormField(BaseModel):
fieldName: str
value: str
notes: Optional[str] = ""
class Calculation(BaseModel):
formula: str
result: str
notes: Optional[str] = ""
class Metadata(BaseModel):
reportDate: str
auditorId: Optional[str] = None
comments: Optional[str] = None
class Content(BaseModel):
paragraphs: List[str]
tables: List["Table"] # assume Table defined elsewhere
checkboxes: List["Checkbox"] # 〃
multipleChoice: List[MultipleChoice]
formFields: List[FormField]
calculations: List[Calculation]
metadata: Optional[Metadata] = Metadata(reportDate="")
class Section(BaseModel):
id: str
title: str
content: Content
class Document(BaseModel):
documentTitle: str
documentDate: str
sections: List[Section]
SCHEMA_TEXT = inspect.getsource(Document)
# 2️⃣ Build prompts ──────────────────────────────────────────────────────────
SYSTEM_PROMPT = textwrap.dedent(f"""
You are an expert **data-extraction assistant**.
Extract structured info from unstructured text **exactly** following the Pydantic schema.
── Schema ──
{SCHEMA_TEXT}
─────────────
Rules:
1. Follow the schema for keys & nesting.
2. Copy values verbatim when possible.
3. If a field is missing, return null.
4. Output your step-by-step reasoning first.
5. Then return ONLY the JSON inside this wrapper:
final answer[ json object: {{ ... }} ]
Format:
<reasoning>…</reasoning>
<answer>
final answer[ json object: {{ … }} ]
</answer>
""").strip()
UNSTRUCTURED_TEXT = """
12 April 2025 – Onsite audit performed by Jane Smith.
Observations: Two fire extinguishers past expiry; emergency lights functional.
Calculations: Total extinguishers = 8, expired = 2 β†’ 25 % overdue.
"""
USER_PROMPT = textwrap.dedent(f"""
### Task
Convert the following *hier* text to the schema.
### hier
{UNSTRUCTURED_TEXT}
""").strip()
# 3️⃣ Generate ───────────────────────────────────────────────────────────────
tok = AutoTokenizer.from_pretrained(MODEL, use_fast=True)
model = AutoModelForCausalLM.from_pretrained(
MODEL,
device_map="auto",
torch_dtype=torch.bfloat16
)
gen = pipeline("text-generation", model=model, tokenizer=tok,
max_new_tokens=512, do_sample=False)
prompt = f"<|system|>\n{SYSTEM_PROMPT}\n<|user|>\n{USER_PROMPT}"
raw_out = gen(prompt)[0]["generated_text"]
# 4️⃣ Slice out the JSON ─────────────────────────────────────────────────────
start = raw_out.find("final answer[")
end = raw_out.rfind("]") + 1
json_text = raw_out[start:].split("json object:")[-1].strip(" []\n")
data = json.loads(json_text) # βœ… Raises if malformed
print(raw_out) # reasoning + JSON
print("\nβœ… Parsed object:\n", data)
````
### Why it Works 🧐
* **Schema-priming** ensures key-level fidelityβ€”no β€œcreative” field names.
* **Chain-of-thought** improves factual extraction (was rewarded during GRPO).
* The `final answer[…]` wrapper makes downstream parsing a one-liner.
---
## πŸ‹οΈ Training Recipe (Condensed)
| Setting | Value |
| -------------- | ------------------------------------------------------------------- |
| **Algorithm** | GRPO – policy β‰ˆ LM, reward LM β‰ˆ `Qwen2.5-7B` w/ JSON-validator head |
| **Epochs** | 3 (effective) |
| **Batch** | Grad-accum 8, bfloat16 |
| **Optimizer** | Fused AdamW |
| **Throughput** | β‰ˆ 45 k tokens/s on 8Γ—A100 |
---
## πŸ“Š Evaluation (WIP)
| Metric | Status |
| ------------------------- | ------ |
| Exact-Match JSON Accuracy | πŸ”œ |
| Structural F1 | πŸ”œ |
| Valid-JSON Rate | πŸ”œ |
Stay tunedβ€”numbers landing faster than you can say β€œschema validation.” πŸ›°οΈ
---
## 🀝 Citation
```bibtex
@misc{bhaviktheslider_2025_unsloth_qwen2.5_3b_grpo,
title = {An Unsloth-accelerated GRPO-trained Qwen 2.5-3B for JSON structuring},
author = {MasterControlAIML},
year = {2025},
howpublished = {\url{https://huggingface.co/MasterControlAIML/DeepSeek-R1-Qwen2.5-3b-LLM-Judge-Reward-JSON-Unstructured-To-Structured-Lora}}
}
```
*May your JSON always parse and your losses always converge!* 😎
```