|
|
|
--- |
|
|
|
base_model: unsloth/Qwen2.5-3B-Instruct |
|
tags: |
|
- text-generation-inference |
|
- transformers |
|
- unsloth |
|
- qwen2 |
|
- trl |
|
- grpo |
|
license: apache-2.0 |
|
language: |
|
- en |
|
--- |
|
|
|
# π¦ Uploaded Model |
|
|
|
| **Field** | **Value** | |
|
|-----------------------|--------------------------------------------| |
|
| **Developed by** | **MasterControlAIML** | |
|
| **License** | Apache 2.0 | |
|
| **Finetuned from** | `unsloth/Qwen2.5-3B-Instruct` | |
|
| **Training Framework**| [Unsloth](https://github.com/unslothai/unsloth) Γ Hugging Face TRL | |
|
|
|
[<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="190"/>](https://github.com/unslothai/unsloth) |
|
|
|
--- |
|
|
|
## π Whatβs New? |
|
> *The protein-shake sequel to **MasterControlAIML/DeepSeek-R1-Qwen2.5-1.5b-SFT-R1-JSON-Unstructured-To-Structured**βnow with more neurons, zero SFT, and a league of reward functions.* |
|
|
|
| Upgrade | Explanation | |
|
|--------------------|------------------------------------------------------------------------------| |
|
| **Bigger Backbone**| 1.5 B β **3 B** Qwen 2.5 for bigger reasoning muscles. | |
|
| **Pure RL** | No supervised fine-tuningβpolicy learned *only* from reward signals (GRPO). | |
|
| **LM-as-Judge** | Separate LLM rates each candidate for correctness, JSON validity, style⦠| |
|
| **2Γ Faster Train**| Unslothβs flash-attention & fused ops = less VRAM, more speed. | |
|
|
|
--- |
|
|
|
## π οΈ Intended Use |
|
* Convert messy prose, logs, or audit notes into a pristine JSON document that follows a complex, nested schema. |
|
* Drop-in replacement for any pipeline using the older DeepSeek-R1 1.5 B structurerβjust swap the checkpoint and enjoy the headroom. |
|
|
|
--- |
|
|
|
## π§ How to Use (Reasoning + JSON) |
|
The snippet below: |
|
|
|
1. **Primes** the model with the *exact* Pydantic schema, so it outputs the right keys. |
|
2. Makes the model **think step-by-step** (reasoning) but still wraps the final JSON in an easy-to-parse container. |
|
3. Uses the correct repo name: `MasterControlAIML/DeepSeek-R1-Qwen2.5-3b-LLM-Judge-Reward-JSON-Unstructured-To-Structured-Lora`. |
|
|
|
```python |
|
# βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ |
|
# QUICK-START |
|
# Structured-data extraction with reasoning + JSON output |
|
# βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ |
|
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline |
|
import torch, json, textwrap, inspect |
|
from pydantic import BaseModel |
|
from typing import List, Optional |
|
|
|
MODEL = "MasterControlAIML/DeepSeek-R1-Qwen2.5-3b-LLM-Judge-Reward-JSON-Unstructured-To-Structured-Lora" |
|
|
|
# 1οΈβ£ Inline schema (keeps the LLM on-rails) βββββββββββββββββββββββββββββββββ |
|
class MultipleChoice(BaseModel): |
|
question: str |
|
options: List[str] |
|
selected: str |
|
|
|
class FormField(BaseModel): |
|
fieldName: str |
|
value: str |
|
notes: Optional[str] = "" |
|
|
|
class Calculation(BaseModel): |
|
formula: str |
|
result: str |
|
notes: Optional[str] = "" |
|
|
|
class Metadata(BaseModel): |
|
reportDate: str |
|
auditorId: Optional[str] = None |
|
comments: Optional[str] = None |
|
|
|
class Content(BaseModel): |
|
paragraphs: List[str] |
|
tables: List["Table"] # assume Table defined elsewhere |
|
checkboxes: List["Checkbox"] # γ |
|
multipleChoice: List[MultipleChoice] |
|
formFields: List[FormField] |
|
calculations: List[Calculation] |
|
metadata: Optional[Metadata] = Metadata(reportDate="") |
|
|
|
class Section(BaseModel): |
|
id: str |
|
title: str |
|
content: Content |
|
|
|
class Document(BaseModel): |
|
documentTitle: str |
|
documentDate: str |
|
sections: List[Section] |
|
|
|
SCHEMA_TEXT = inspect.getsource(Document) |
|
|
|
# 2οΈβ£ Build prompts ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ |
|
SYSTEM_PROMPT = textwrap.dedent(f""" |
|
You are an expert **data-extraction assistant**. |
|
Extract structured info from unstructured text **exactly** following the Pydantic schema. |
|
|
|
ββ Schema ββ |
|
{SCHEMA_TEXT} |
|
βββββββββββββ |
|
|
|
Rules: |
|
1. Follow the schema for keys & nesting. |
|
2. Copy values verbatim when possible. |
|
3. If a field is missing, return null. |
|
4. Output your step-by-step reasoning first. |
|
5. Then return ONLY the JSON inside this wrapper: |
|
final answer[ json object: {{ ... }} ] |
|
|
|
Format: |
|
<reasoning>β¦</reasoning> |
|
<answer> |
|
final answer[ json object: {{ β¦ }} ] |
|
</answer> |
|
""").strip() |
|
|
|
UNSTRUCTURED_TEXT = """ |
|
12 April 2025 β Onsite audit performed by Jane Smith. |
|
Observations: Two fire extinguishers past expiry; emergency lights functional. |
|
Calculations: Total extinguishers = 8, expired = 2 β 25 % overdue. |
|
""" |
|
|
|
USER_PROMPT = textwrap.dedent(f""" |
|
### Task |
|
Convert the following *hier* text to the schema. |
|
|
|
### hier |
|
{UNSTRUCTURED_TEXT} |
|
""").strip() |
|
|
|
# 3οΈβ£ Generate βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ |
|
tok = AutoTokenizer.from_pretrained(MODEL, use_fast=True) |
|
model = AutoModelForCausalLM.from_pretrained( |
|
MODEL, |
|
device_map="auto", |
|
torch_dtype=torch.bfloat16 |
|
) |
|
gen = pipeline("text-generation", model=model, tokenizer=tok, |
|
max_new_tokens=512, do_sample=False) |
|
|
|
prompt = f"<|system|>\n{SYSTEM_PROMPT}\n<|user|>\n{USER_PROMPT}" |
|
raw_out = gen(prompt)[0]["generated_text"] |
|
|
|
# 4οΈβ£ Slice out the JSON βββββββββββββββββββββββββββββββββββββββββββββββββββββ |
|
start = raw_out.find("final answer[") |
|
end = raw_out.rfind("]") + 1 |
|
json_text = raw_out[start:].split("json object:")[-1].strip(" []\n") |
|
data = json.loads(json_text) # β
Raises if malformed |
|
|
|
print(raw_out) # reasoning + JSON |
|
print("\nβ
Parsed object:\n", data) |
|
```` |
|
|
|
### Why it Works π§ |
|
|
|
* **Schema-priming** ensures key-level fidelityβno βcreativeβ field names. |
|
* **Chain-of-thought** improves factual extraction (was rewarded during GRPO). |
|
* The `final answer[β¦]` wrapper makes downstream parsing a one-liner. |
|
|
|
--- |
|
|
|
## ποΈ Training Recipe (Condensed) |
|
|
|
| Setting | Value | |
|
| -------------- | ------------------------------------------------------------------- | |
|
| **Algorithm** | GRPO β policy β LM, reward LM β `Qwen2.5-7B` w/ JSON-validator head | |
|
| **Epochs** | 3 (effective) | |
|
| **Batch** | Grad-accum 8, bfloat16 | |
|
| **Optimizer** | Fused AdamW | |
|
| **Throughput** | β 45 k tokens/s on 8ΓA100 | |
|
|
|
--- |
|
|
|
## π Evaluation (WIP) |
|
|
|
| Metric | Status | |
|
| ------------------------- | ------ | |
|
| Exact-Match JSON Accuracy | π | |
|
| Structural F1 | π | |
|
| Valid-JSON Rate | π | |
|
|
|
Stay tunedβnumbers landing faster than you can say βschema validation.β π°οΈ |
|
|
|
--- |
|
|
|
## π€ Citation |
|
|
|
```bibtex |
|
@misc{bhaviktheslider_2025_unsloth_qwen2.5_3b_grpo, |
|
title = {An Unsloth-accelerated GRPO-trained Qwen 2.5-3B for JSON structuring}, |
|
author = {MasterControlAIML}, |
|
year = {2025}, |
|
howpublished = {\url{https://huggingface.co/MasterControlAIML/DeepSeek-R1-Qwen2.5-3b-LLM-Judge-Reward-JSON-Unstructured-To-Structured-Lora}} |
|
} |
|
``` |
|
|
|
*May your JSON always parse and your losses always converge!* π |
|
|
|
``` |
|
|
|
|