Gemma-3-4b Reasoning R1 Model Card
Gemma-3-4b Reasoning is a transformer-based language model fine-tuned with GRPO (Group Reward Policy Optimization), leveraging the DeepSeek-R1 methodology. This model card describes the instructed version specifically optimized for reasoning tasks.
The entire Gemma-3-4b Reasoning family is available under a permissive Apache 2.0 license. All training scripts and configurations used are publicly accessible.
Model Details
Description
Gemma-3-4b Reasoning is a reasoning-focused fine-tuned model designed to excel in structured, logical problem-solving and mathematical reasoning. The training was performed on the GSM8K dataset using GRPO, enhancing the model's ability to reason step-by-step and provide structured explanations.
Training Dataset
- GSM8K (English): Specialized dataset for mathematical and logical reasoning problems.
Intended Use
Direct Use
The model is specifically designed for structured reasoning tasks, including:
- Mathematical and logical reasoning
- Multi-step problem solving
- Instruction-based reasoning
Out-of-scope Use
This model should not be used for unethical or malicious activities that breach legal and ethical standards.
How to Use
The model uses structured XML templates for dialogue and reasoning tasks:
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model_name = "ericrisco/gemma-3-4b-reasoning"
prompt = "A cyclist travels 60 km in 3 hours at a constant speed. If he maintains the same speed, how many kilometers will he travel in 5 hours?"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
model_name, device_map="auto", torch_dtype=torch.bfloat16
)
messages = [{"role": "user", "content": prompt}]
input_text = tokenizer.apply_chat_template(
messages, tokenize=False, add_generation_prompt=True
)
inputs = tokenizer(input_text, return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=200)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)
Performance
The Gemma-3-4b Reasoning model exhibits robust internal Chain-of-Thought (CoT) capabilities, consistently demonstrating detailed explanations and structured problem-solving skills across reasoning tasks.
Limitations
The model is primarily optimized for numeric and structured reasoning and might produce less accurate or unexpected results when applied to unrelated tasks.
Citations
- Gemma Multimodal Reasoning Model by Google
- GRPO Implementation by TRL
Author
Eric Risco
- Downloads last month
- 0