Inference is quite slow

#2
by silkyar - opened

Hi, I am seeing GPU inference on my machine to be as slow as 3.2s while the running the hugging-face pytorch model directly takes about ~55ms.
Wondering if anyone else noticed the discrepancy or if I am doing something wrong.

I'm doing something along the lines of the following, and I'm timing ort_session.run() calls.

onnx_model_path =  "onnx/model_fp16.onnx"  # downloaded from files
hf_repo = "onnx-community/grounding-dino-tiny-ONNX"

text_prompt = ["plant"]  
image = Image.open(IMAGE_PATH).convert("RGB")

processor = AutoProcessor.from_pretrained(hf_repo)
processor_inputs = processor(images=image, text=text_prompt, return_tensors="pt")

device = torch.device("cuda")
providers = ["CUDAExecutionProvider"]

processor_inputs = {k: v.to(device) for k, v in processor_inputs.items()}
onnx_inputs = {k: v.detach().cpu().numpy() for k, v in processor_inputs.items()}

ort_session = ort.InferenceSession(onnx_model_path, providers=providers)
outputs = ort_session.run(None, onnx_inputs)
Your need to confirm your account before you can post a new comment.

Sign up or log in to comment