Gemma-3-4b Reasoning R1 Model Card

Gemma-3-4b Reasoning is a transformer-based language model fine-tuned with GRPO (Group Reward Policy Optimization), leveraging the DeepSeek-R1 methodology. This model card describes the instructed version specifically optimized for reasoning tasks.

The entire Gemma-3-4b Reasoning family is available under a permissive Apache 2.0 license. All training scripts and configurations used are publicly accessible.

Model Details

Description

Gemma-3-4b Reasoning is a reasoning-focused fine-tuned model designed to excel in structured, logical problem-solving and mathematical reasoning. The training was performed on the GSM8K dataset using GRPO, enhancing the model's ability to reason step-by-step and provide structured explanations.

Training Dataset

GSM8K (English): Specialized dataset for mathematical and logical reasoning problems.

Intended Use

Direct Use

The model is specifically designed for structured reasoning tasks, including:

Mathematical and logical reasoning
Multi-step problem solving
Instruction-based reasoning

Out-of-scope Use

This model should not be used for unethical or malicious activities that breach legal and ethical standards.

How to Use

The model uses structured XML templates for dialogue and reasoning tasks:

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_name = "ericrisco/gemma-3-4b-reasoning"

prompt = "A cyclist travels 60 km in 3 hours at a constant speed. If he maintains the same speed, how many kilometers will he travel in 5 hours?"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name, device_map="auto", torch_dtype=torch.bfloat16
)

messages = [{"role": "user", "content": prompt}]

input_text = tokenizer.apply_chat_template(
    messages, tokenize=False, add_generation_prompt=True
)

inputs = tokenizer(input_text, return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=200)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)

print(response)

Performance

The Gemma-3-4b Reasoning model exhibits robust internal Chain-of-Thought (CoT) capabilities, consistently demonstrating detailed explanations and structured problem-solving skills across reasoning tasks.

Limitations

The model is primarily optimized for numeric and structured reasoning and might produce less accurate or unexpected results when applied to unrelated tasks.

Citations

Gemma Multimodal Reasoning Model by Google
GRPO Implementation by TRL

Author

Eric Risco

ericrisco
/

gemma-3-4b-reasoning