Wrong configs
There are a few issues loading the Gemma3 model with AutoModelForCausalLM. The core problem is that the current config.json is set up for multi-modal usage (with "text_config" and "vision_config") but is missing key text fields at the top level (like "vocab_size" and "hidden_size") that the text-only classes look for. Specifically:
• There is no "vocab_size" field, yet the checkpoint’s embedding matrix is sized [262208, hidden_size] (because it has extra tokens for images).
• The text fields are nested under "text_config", but Gemma3ForCausalLM expects them at the top level (like config.hidden_size, config.num_hidden_layers, etc.).
• The uploaded config references "Gemma3ForConditionalGeneration", implying multi-modal usage. But for text-only usage, we must patch the config ourselves to match the real embedding dimension and top-level text fields.
Potential fixes:
1. Add text fields at the top level (e.g. "hidden_size": 2560, "vocab_size": 262208, etc.) so that AutoModelForCausalLM can read them directly without error.
2. Use a multi-modal class such as Gemma3ForConditionalGeneration that explicitly handles both text_config and vision_config if that’s the intended usage.
Fixing this manually shows that the model should load fine if this is addressed:
import torch
from transformers import (
AutoConfig,
AutoTokenizer,
pipeline
)
from transformers.models.gemma3.configuration_gemma3 import Gemma3TextConfig
from transformers.models.gemma3.modeling_gemma3 import Gemma3ForCausalLM
# Name or local path of the Gemma3 model checkpoint
model_name = "google/gemma-3-4b-pt"
# Load the multi-modal config
multi_config = AutoConfig.from_pretrained(model_name)
# Extract the text-specific config to a dict
text_cfg_dict = multi_config.text_config.to_dict()
# Load the tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_name)
# Ensure the vocab size matches the checkpoint's embedding shape
# (the checkpoint has embed_tokens.weight of size [262208, 2560], so we set 262208).
text_cfg_dict["vocab_size"] = 262208
# Add any special token IDs from the tokenizer
if tokenizer.pad_token_id is not None:
text_cfg_dict["pad_token_id"] = tokenizer.pad_token_id
text_cfg_dict["bos_token_id"] = tokenizer.bos_token_id
text_cfg_dict["eos_token_id"] = tokenizer.eos_token_id
# Build a text-only config
text_config = Gemma3TextConfig(**text_cfg_dict)
# Load the model using that text config
model = Gemma3ForCausalLM.from_pretrained(
model_name,
config=text_config,
torch_dtype=torch.bfloat16,
device_map=None,
low_cpu_mem_usage=False,
)
# Create a text-generation pipeline
pipe = pipeline(
"text-generation",
model=model,
tokenizer=tokenizer,
device=0 if torch.cuda.is_available() else -1
)
prompt = "Eiffel tower is located in"
output = pipe(prompt, max_new_tokens=50)
print("Generated text:", output[0]["generated_text"])
```