dumbequation
/

Qwen2.5-3B-reasoning-medical-symptoms-GRPO-f16-GGUF

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Uploaded model

Developed by: dumbequation
License: apache-2.0
Finetuned from model : unsloth/qwen2.5-3b-instruct-unsloth-bnb-4bit

This qwen2 model was trained 2x faster with Unsloth and Huggingface's TRL library.

Downloads last month: 158

GGUF

Model size

3.09B params

Architecture

qwen2

16-bit

Inference Providers NEW

This model is not currently available via any of the supported Inference Providers.

The model cannot be deployed to the HF Inference API: The model has no pipeline_tag.

Collection including dumbequation/Qwen2.5-3B-reasoning-medical-symptoms-GRPO-f16-GGUF

Reasoning Work

Models I've trained to think like DeepSeek R1 using online learning - Group Relative Policy Optimization (GRPO) introduced by DeepSeekMath • 6 items • Updated about 5 hours ago