bhaviktheslider commited on
Commit
22aa838
·
verified ·
1 Parent(s): a2fc555

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +69 -8
README.md CHANGED
@@ -1,23 +1,84 @@
1
  ---
2
- base_model: unsloth/Qwen2.5-7B-Instruct
 
3
  tags:
4
  - text-generation-inference
5
  - transformers
6
  - unsloth
7
  - qwen2
8
  - trl
9
- - grpo
10
  license: apache-2.0
11
  language:
12
  - en
13
  ---
14
 
15
- # Uploaded model
16
 
17
- - **Developed by:** bhaviktheslider
18
- - **License:** apache-2.0
19
- - **Finetuned from model :** unsloth/Qwen2.5-7B-Instruct
20
-
21
- This qwen2 model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
 
22
 
23
  [<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ # 🦄 Model Card
3
+ base_model: unsloth/Qwen2.5-3B-Instruct
4
  tags:
5
  - text-generation-inference
6
  - transformers
7
  - unsloth
8
  - qwen2
9
  - trl
10
+ - grpo # Gradient Reward Policy Optimization
11
  license: apache-2.0
12
  language:
13
  - en
14
  ---
15
 
16
+ # 📦 Uploaded Model
17
 
18
+ | **Field** | **Value** |
19
+ |-----------------------|--------------------------------------------|
20
+ | **Developed by** | **bhaviktheslider** |
21
+ | **License** | Apache 2.0 |
22
+ | **Finetuned from** | `unsloth/Qwen2.5-3B-Instruct` |
23
+ | **Training Framework**| [Unsloth](https://github.com/unslothai/unsloth) × Hugging Face TRL |
24
 
25
  [<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)
26
+
27
+ ---
28
+
29
+ ## 🚀 What’s New?
30
+
31
+ > **TL;DR** – Think of this model as the beefed-up, protein-shake-powered sequel to **MasterControlAIML/DeepSeek-R1-Qwen2.5-1.5b-SFT-R1-JSON-Unstructured-To-Structured** … except we ditched the SFT and let a squad of reward functions do the coaching.
32
+
33
+ ### Key Upgrades
34
+ 1. **Larger Backbone** – We jumped from a 1.5 B parameter model to a 3 B parameter **Qwen 2.5** variant for more representational oomph.
35
+ 2. **No SFT, All 🍬 Rewards** – Instead of supervised fine-tuning, training relied solely on reward-based optimization (GRPO).
36
+ - **LM-as-Judge**: A language model scored candidate outputs for task quality.
37
+ - **Auxiliary Rewards**: Style, length, and JSON-validity rewards kept the model on its best behavior.
38
+ 3. **2× Faster Training** – Courtesy of Unsloth’s memory-efficient tricks (flash attention + fused optimizers).
39
+
40
+ ---
41
+
42
+ ## 🛠️ Intended Use
43
+
44
+ - Converts messy, free-form text into structured JSON—exactly like its 1.5 B predecessor, but with a deeper knowledge reservoir and reinforcement-tuned precision.
45
+ - Drop-in replacement for any pipeline already using the DeepSeek-R1 model. Just swap checkpoints and enjoy the headroom.
46
+
47
+ ---
48
+
49
+ ## 🏋️ Training Details
50
+
51
+ | Item | Value |
52
+ |------|-------|
53
+ | **Base Model** | `unsloth/Qwen2.5-3B-Instruct` |
54
+ | **Batching** | Gradient Accumulation 8, bfloat16 |
55
+ | **Optimizer** | AdamW (fused) |
56
+ | **Algorithm** | GRPO (policy ≈ LM; reward model ≈ separate LM judge) |
57
+ | **Epochs** | 3 (effective) |
58
+ | **Speed** | ~2× faster vs. vanilla PyTorch thanks to Unsloth |
59
+
60
+ ---
61
+
62
+ ## 📊 Evaluation (Coming Soon)
63
+
64
+ We’re benchmarking against:
65
+ - Exact-match JSON accuracy
66
+ - Structural F1
67
+ - Valid-JSON rate
68
+
69
+ …stay tuned—numbers arriving faster than you can say “schema validation.”
70
+
71
+ ---
72
+
73
+ ## 🤝 Citation
74
+
75
+ If you build something cool with this model, a shout-out would be lovely:
76
+
77
+ ```bibtex
78
+ @misc{bhaviktheslider_2025_unsloth_qwen2.5_3b_grpo,
79
+ title = {An Unsloth-accelerated GRPO-trained Qwen 2.5 3B for JSON structuring},
80
+ author = {Bhaviktheslider},
81
+ year = {2025},
82
+ howpublished = {Hugging Face},
83
+ note = {https://huggingface.co/bhaviktheslider/<repo>}
84
+ }