dumbequation/Qwen2.5-3B-reasoning-medical-symptoms-GRPO-quant
Updated
•
280
Models I've trained to think like DeepSeek R1 using online learning - Group Relative Policy Optimization (GRPO) introduced by DeepSeekMath