Example code with vision input needed to be `.to(model.device, dtype=model.dtype)` instead of `.to(model.dtype)`
#2
by
theblackcat102
- opened
Great work for releasing these 2 VLM. However as shown in the title, running the provided _inference
code with image would results in Runtime error:
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument weight in method wrapper_CUDA___slow_conv2d_forward)
The issue is because that the pixel_values, wasn't transferred to cuda as well (model.device)
Here's the working version
def _inference(tokenizer, model, generation_config, prompt, pixel_values=None):
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
inputs = { key: tensor.to(model.device) for key, tensor in inputs.items() }
if pixel_values is None:
output_tensors = model.generate(**inputs, generation_config=generation_config)
else:
output_tensors = model.generate(**inputs, generation_config=generation_config, pixel_values=pixel_values.to(model.device, dtype=model.dtype))
output_str = tokenizer.decode(output_tensors[0])
return output_str
Thanks for your comments.
The code is updated.
YC-Chen
changed discussion status to
closed