MediaTek-Research/Llama-Breeze2-3B-Instruct · Example code with vision input needed to be `.to(model.device, dtype=model.dtype)` instead of `.to(model.dtype)`

21 days ago

Great work for releasing these 2 VLM. However as shown in the title, running the provided _inference code with image would results in Runtime error:

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument weight in method wrapper_CUDA___slow_conv2d_forward)

The issue is because that the pixel_values, wasn't transferred to cuda as well (model.device)

Here's the working version

def _inference(tokenizer, model, generation_config, prompt, pixel_values=None):
    inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
    inputs = { key: tensor.to(model.device) for key, tensor in inputs.items() }
    if pixel_values is None:
        output_tensors = model.generate(**inputs, generation_config=generation_config)
    else:
        output_tensors = model.generate(**inputs, generation_config=generation_config, pixel_values=pixel_values.to(model.device, dtype=model.dtype))
    output_str = tokenizer.decode(output_tensors[0])
    return output_str

YC-Chen

MediaTek Research org 21 days ago

Thanks for your comments.
The code is updated.

YC-Chen changed discussion status to closed 21 days ago