bhaviktheslider commited on
Commit
6f17c01
·
verified ·
1 Parent(s): 22aa838

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +123 -29
README.md CHANGED
@@ -1,3 +1,4 @@
 
1
  ---
2
  # 🦄 Model Card
3
  base_model: unsloth/Qwen2.5-3B-Instruct
@@ -22,57 +23,145 @@ language:
22
  | **Finetuned from** | `unsloth/Qwen2.5-3B-Instruct` |
23
  | **Training Framework**| [Unsloth](https://github.com/unslothai/unsloth) × Hugging Face TRL |
24
 
25
- [<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)
26
 
27
  ---
28
 
29
  ## 🚀 What’s New?
 
30
 
31
- > **TL;DR** Think of this model as the beefed-up, protein-shake-powered sequel to **MasterControlAIML/DeepSeek-R1-Qwen2.5-1.5b-SFT-R1-JSON-Unstructured-To-Structured** … except we ditched the SFT and let a squad of reward functions do the coaching.
32
-
33
- ### Key Upgrades
34
- 1. **Larger Backbone** We jumped from a 1.5 B parameter model to a 3 B parameter **Qwen 2.5** variant for more representational oomph.
35
- 2. **No SFT, All 🍬 Rewards** Instead of supervised fine-tuning, training relied solely on reward-based optimization (GRPO).
36
- - **LM-as-Judge**: A language model scored candidate outputs for task quality.
37
- - **Auxiliary Rewards**: Style, length, and JSON-validity rewards kept the model on its best behavior.
38
- 3. **2× Faster Training** – Courtesy of Unsloth’s memory-efficient tricks (flash attention + fused optimizers).
39
 
40
  ---
41
 
42
  ## 🛠️ Intended Use
43
 
44
- - Converts messy, free-form text into structured JSON—exactly like its 1.5 B predecessor, but with a deeper knowledge reservoir and reinforcement-tuned precision.
45
- - Drop-in replacement for any pipeline already using the DeepSeek-R1 model. Just swap checkpoints and enjoy the headroom.
46
 
47
  ---
48
 
49
- ## 🏋️ Training Details
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
50
 
51
- | Item | Value |
52
- |------|-------|
53
- | **Base Model** | `unsloth/Qwen2.5-3B-Instruct` |
54
- | **Batching** | Gradient Accumulation 8, bfloat16 |
55
- | **Optimizer** | AdamW (fused) |
56
- | **Algorithm** | GRPO (policy ≈ LM; reward model ≈ separate LM judge) |
57
- | **Epochs** | 3 (effective) |
58
- | **Speed** | ~2× faster vs. vanilla PyTorch thanks to Unsloth |
 
 
 
 
 
 
 
 
 
59
 
60
  ---
61
 
62
- ## 📊 Evaluation (Coming Soon)
63
 
64
- We’re benchmarking against:
65
- - Exact-match JSON accuracy
66
- - Structural F1
67
- - Valid-JSON rate
68
 
69
- …stay tuned—numbers arriving faster than you can say “schema validation.”
70
 
71
  ---
72
 
73
- ## 🤝 Citation
 
 
 
 
 
 
 
 
 
 
 
 
74
 
75
- If you build something cool with this model, a shout-out would be lovely:
 
 
 
 
 
 
 
 
76
 
77
  ```bibtex
78
  @misc{bhaviktheslider_2025_unsloth_qwen2.5_3b_grpo,
@@ -80,5 +169,10 @@ If you build something cool with this model, a shout-out would be lovely:
80
  author = {Bhaviktheslider},
81
  year = {2025},
82
  howpublished = {Hugging Face},
83
- note = {https://huggingface.co/bhaviktheslider/<repo>}
84
  }
 
 
 
 
 
 
1
+
2
  ---
3
  # 🦄 Model Card
4
  base_model: unsloth/Qwen2.5-3B-Instruct
 
23
  | **Finetuned from** | `unsloth/Qwen2.5-3B-Instruct` |
24
  | **Training Framework**| [Unsloth](https://github.com/unslothai/unsloth) × Hugging Face TRL |
25
 
26
+ [<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="180"/>](https://github.com/unslothai/unsloth)
27
 
28
  ---
29
 
30
  ## 🚀 What’s New?
31
+ Think of this as the protein-shake sequel to **MasterControlAIML/DeepSeek-R1-Qwen2.5-1.5b-SFT-R1-JSON-Unstructured-To-Structured**—but now with **3 B parameters, zero SFT, and a reward-only training regime** (GRPO) backed by an LM judge + auxiliary reward functions.
32
 
33
+ | Upgrade | Explanation |
34
+ |---------|-------------|
35
+ | **Bigger Backbone** | 1.5 B → **3 B** Qwen 2.5 for deeper reasoning headroom. |
36
+ | **Pure RL** | No supervised fine-tuning—policy learned *entirely* from reward signals. |
37
+ | **LM-as-Judge** | Separate LLM scores each candidate for correctness, JSON validity, length & style. |
38
+ | ** Faster Training** | Courtesy of Unsloth’s memory-savings (flash-attention, fused ops). |
 
 
39
 
40
  ---
41
 
42
  ## 🛠️ Intended Use
43
 
44
+ Structured-data extraction from messy prose, logs, or transcripts.
45
+ Drop-in upgrade for any pipeline currently using the older 1.5 B DeepSeek-R1 JSON-structurer.
46
 
47
  ---
48
 
49
+ ## 🔧 How to Use
50
+
51
+ Below is a minimal example that **re-uses the exact prompt format** from the previous model.
52
+ The model first *reasons* silently (chain-of-thought is kept internal) and then emits only the target JSON.
53
+
54
+ > **Model name** used in the snippet → `bhaviktheslider/unsloth-qwen2.5-3b-grpo-json-structurer`
55
+ > Replace with your actual repo path if different.
56
+
57
+ ### 1️⃣ Transformers Quick-Start
58
+
59
+ ```python
60
+ from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
61
+ import torch, json, textwrap
62
+
63
+ MODEL = "MasterControlAIML/DeepSeek-R1-Qwen2.5-3b-LLM-Judge-Reward-JSON-Unstructured-To-Structured-Merged-Lora-16bit"
64
+
65
+ tok = AutoTokenizer.from_pretrained(MODEL, use_fast=True)
66
+ model = AutoModelForCausalLM.from_pretrained(
67
+ MODEL,
68
+ torch_dtype=torch.float16,
69
+ device_map="auto"
70
+ )
71
+
72
+ # --- Prompt (identical structure to previous model) ---
73
+ system_prompt = (
74
+ "You are an intelligent JSON conversion engine. "
75
+ "Think step-by-step, and then output the final valid JSON."
76
+ )
77
+
78
+ task_prompt = textwrap.dedent("""\
79
+ ### Task
80
+ Convert the following unstructured text into the JSON schema shown below.
81
+ Return *only* valid JSON.
82
+
83
+ ### Schema
84
+ {
85
+ "name": str,
86
+ "age": int,
87
+ "city": str,
88
+ "skills": [str]
89
+ }
90
+
91
+ ### Unstructured text
92
+ John Doe, a 28-year-old software engineer living in Austin, loves Python and Golang.
93
+ """)
94
+
95
+ generator = pipeline(
96
+ "text-generation",
97
+ model=model,
98
+ tokenizer=tok,
99
+ max_new_tokens=256,
100
+ do_sample=False,
101
+ )
102
+
103
+ output = generator(f"<|system|>\n{system_prompt}\n<|user|>\n{task_prompt}")[0]["generated_text"]
104
+
105
+ data = json.loads(output) # ✅ will raise if JSON isn’t valid
106
+ print(data)
107
+ ````
108
+
109
+ ### 2️⃣ Text-Generation-Inference (TGI)
110
 
111
+ ```bash
112
+ # start server (8-bit, BF16, etc. as needed)
113
+ text-generation-launcher --model-id MasterControlAIML/DeepSeek-R1-Qwen2.5-3b-LLM-Judge-Reward-JSON-Unstructured-To-Structured-Merged-Lora-16bit
114
+
115
+ # curl call
116
+ curl http://localhost:8080/generate \
117
+ -d '{
118
+ "inputs": "<|system|>\nYou are an intelligent JSON conversion engine. Think step-by-step, but ONLY output the final valid JSON.\n<|user|>\n### Task\nConvert the following unstructured text into the JSON schema shown below.\nReturn *only* valid JSON.\n\n### Schema\n{\"title\": str, \"authors\": [str], \"year\": int}\n\n### Unstructured text\n\"Deep Learning\" was written by Ian Goodfellow, Yoshua Bengio, and Aaron Courville in 2016.\n",
119
+ "parameters": {"max_new_tokens": 256, "do_sample": false}
120
+ }'
121
+ ```
122
+
123
+ The response will be a pure JSON string, e.g.:
124
+
125
+ ```json
126
+ {"title":"Deep Learning","authors":["Ian Goodfellow","Yoshua Bengio","Aaron Courville"],"year":2016}
127
+ ```
128
 
129
  ---
130
 
131
+ ## 🤖 Why This Prompt Works
132
 
133
+ 1. **System role** instructs the model to plan internally and expose *only* the JSON.
134
+ 2. **Schema block** constrains the output keys & types.
135
+ 3. **GRPO training** rewarded strict adherence to schema validity and penalized hallucinated keys or malformed JSON.
136
+ 4. **LM-as-Judge** provides dense shaping signals: structural accuracy, content fidelity, token length, even stylistic consistency.
137
 
138
+ The result: *reliable one-shot structuring without post-processing hacks*.
139
 
140
  ---
141
 
142
+ ## 🏋️ Training Recipe (Condensed)
143
+
144
+ | Setting | Value |
145
+ | -------------------- | ------------------------------------------------------------------- |
146
+ | **Algorithm** | GRPO (policy ≈ LM; reward LM ≈ `Qwen2.5-7B` w/ JSON validator head) |
147
+ | **Effective Epochs** | 3 |
148
+ | **Batching** | Accum 8, bfloat16 |
149
+ | **Optimizer** | Fused AdamW |
150
+ | **Throughput** | \~45 k tokens/s on 8×A100 |
151
+
152
+ ---
153
+
154
+ ## 📊 Planned Eval
155
 
156
+ * **Exact-Match JSON Accuracy**
157
+ * **Structural F1**
158
+ * **Valid-JSON Rate**
159
+
160
+ Benchmarks incoming—watch this space. 🛰️
161
+
162
+ ---
163
+
164
+ ## 🤝 Citation
165
 
166
  ```bibtex
167
  @misc{bhaviktheslider_2025_unsloth_qwen2.5_3b_grpo,
 
169
  author = {Bhaviktheslider},
170
  year = {2025},
171
  howpublished = {Hugging Face},
172
+ note = {\url{https://huggingface.co/bhaviktheslider/unsloth-qwen2.5-3b-grpo-json-structurer}}
173
  }
174
+ ```
175
+
176
+ *May your JSON always parse and your losses always converge!* 😎
177
+
178
+ ```