R1 data distilled onto google/gemma-3-4b-it

Inference test code

from transformers import AutoProcessor, Gemma3ForConditionalGeneration
from PIL import Image
import requests
import torch

model_id = "gghfez/R1-Gemma-3-4B-multimodal-test"

model = Gemma3ForConditionalGeneration.from_pretrained(
    model_id, device_map="auto"
).eval()

processor = AutoProcessor.from_pretrained(model_id)

messages = [
    {
        "role": "system",
        "content": [{"type": "text", "text": "You are a helpful assistant."}]
    },
    {
        "role": "user",
        "content": [
            {"type": "image", "image": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/bee.jpg"},
            {"type": "text", "text": "Describe this image in detail."}
        ]
    }
]

inputs = processor.apply_chat_template(
    messages, add_generation_prompt=True, tokenize=True,
    return_dict=True, return_tensors="pt"
).to(model.device, dtype=torch.bfloat16)

input_len = inputs["input_ids"].shape[-1]

with torch.inference_mode():
    generation = model.generate(**inputs, max_new_tokens=500, do_sample=False)
    generation = generation[0][input_len:]

decoded = processor.decode(generation, skip_special_tokens=True)
print(decoded)

Gemma-3-4b response

**Overall Impression:** The image is a close-up shot of a vibrant garden scene, 
focusing on a cluster of pink cosmos flowers and a busy bumblebee. 
It has a slightly soft, natural feel, likely captured in daylight.

gghfez/R1-Gemma-3-4B-multimodal-test response

<think>
Okay, let's start by looking at the image. The main elements are the pink cosmos flowers and a bee on one of them. The background has some green foliage and other flowers.
First, the foreground is dominated by the pink cosmos. The bee is on one of the petals, so I should describe its position and appearance. The bee has black and yellow stripes, and its body is fuzzy. It's interacting with the flower, maybe collecting pollen.
Next, there are other cosmos flowers in the background, some with buds. The colors are mostly pink and purple. There's also a red flower with a prominent petal, which adds a pop of color.
The leaves in the background are green and some have a textured appearance. The lighting seems natural, maybe a bit overcast since the colors aren't too vibrant.
I should mention the composition, how the bee is the focal point, and the other flowers provide context. The depth of field is shallow, so the bee is in focus while the background is slightly blurred.
Overall, the image captures a moment in a garden, with the bee's activity highlighting the ecosystem. The colors are soft and natural, creating a peaceful scene.
</think>

**Image Description:**

The photograph captures a serene garden scene, focusing on a pink cosmos flower with a bee as the central subject. The cosmos, with its layered petals and delicate texture, dominates the foreground. A fuzzy, black-and-yellow bee is perched on one of the petals, its body dusted with pollen, suggesting it's actively foraging.

In the background, several other cosmos flowers—some in full bloom, others with buds—add depth and color variation. A vibrant red flower with a prominent petal stands out, contrasting with the pink hues. The foliage behind the flowers is lush and green, with some leaves exhibiting a textured, almost velvety appearance.

The lighting appears soft and natural, with subtle shadows that highlight the flower's contours. The composition is balanced, with the bee's placement drawing the eye while the surrounding blooms provide context. The shallow depth of field keeps the bee sharp, blurring the background to emphasize the moment.

Overall, the image evokes a sense of tranquility and the interconnectedness of nature, with the bee's presence symbolizing pollination and the garden's vitality.
Downloads last month
0
Safetensors
Model size
4.3B params
Tensor type
BF16
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.

Model tree for gghfez/R1-Gemma-3-4B-multimodal-test

Finetuned
(11)
this model