Inference is quite slow
#2
by
silkyar
- opened
Hi, I am seeing GPU inference on my machine to be as slow as 3.2s while the running the hugging-face
pytorch model directly takes about ~55ms.
Wondering if anyone else noticed the discrepancy or if I am doing something wrong.
I'm doing something along the lines of the following, and I'm timing ort_session.run()
calls.
onnx_model_path = "onnx/model_fp16.onnx" # downloaded from files
hf_repo = "onnx-community/grounding-dino-tiny-ONNX"
text_prompt = ["plant"]
image = Image.open(IMAGE_PATH).convert("RGB")
processor = AutoProcessor.from_pretrained(hf_repo)
processor_inputs = processor(images=image, text=text_prompt, return_tensors="pt")
device = torch.device("cuda")
providers = ["CUDAExecutionProvider"]
processor_inputs = {k: v.to(device) for k, v in processor_inputs.items()}
onnx_inputs = {k: v.detach().cpu().numpy() for k, v in processor_inputs.items()}
ort_session = ort.InferenceSession(onnx_model_path, providers=providers)
outputs = ort_session.run(None, onnx_inputs)