CUDA error: CUBLAS_STATUS_NOT_SUPPORTED

#25
by surak - opened

I can run other models fine on the same venv, but this one gives me the following error:

RuntimeError: CUDA error: CUBLAS_STATUS_NOT_SUPPORTED when calling `cublasLtMatmulAlgoGetHeuristic( ltHandle, computeDesc.descriptor(), Adesc.descriptor(), Bdesc.descriptor(), Cdesc.descriptor(), Cdesc.descriptor(), preference.descriptor(), 1, &heuristicResult, &returnedResult)`

Full error on FastChat:

2025-03-14 15:33:52 | ERROR | stderr | [rank0]: Traceback (most recent call last):
2025-03-14 15:33:52 | ERROR | stderr | [rank0]:   File "/p/haicluster/llama/FastChat/fastchat/serve/vllm_worker.py", line 290, in <module>
2025-03-14 15:33:52 | ERROR | stderr | [rank0]:     engine = AsyncLLMEngine.from_engine_args(engine_args)
2025-03-14 15:33:52 | ERROR | stderr | [rank0]:              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-03-14 15:33:52 | ERROR | stderr | [rank0]:   File "/p/haicluster/llama/FastChat/sc_venv_sglang2/venv/lib/python3.11/site-packages/vllm/engine/async_llm_engine.py", line 649, in from_engine_args
2025-03-14 15:33:52 | ERROR | stderr | [rank0]:     engine = cls(
2025-03-14 15:33:52 | ERROR | stderr | [rank0]:              ^^^^
2025-03-14 15:33:52 | ERROR | stderr | [rank0]:   File "/p/haicluster/llama/FastChat/sc_venv_sglang2/venv/lib/python3.11/site-packages/vllm/engine/async_llm_engine.py", line 599, in __init__
2025-03-14 15:33:52 | ERROR | stderr | [rank0]:     self.engine = self._engine_class(*args, **kwargs)
2025-03-14 15:33:52 | ERROR | stderr | [rank0]:                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-03-14 15:33:52 | ERROR | stderr | [rank0]:   File "/p/haicluster/llama/FastChat/sc_venv_sglang2/venv/lib/python3.11/site-packages/vllm/engine/async_llm_engine.py", line 267, in __init__
2025-03-14 15:33:52 | ERROR | stderr | [rank0]:     super().__init__(*args, **kwargs)
2025-03-14 15:33:52 | ERROR | stderr | [rank0]:   File "/p/haicluster/llama/FastChat/sc_venv_sglang2/venv/lib/python3.11/site-packages/vllm/engine/llm_engine.py", line 277, in __init__
2025-03-14 15:33:52 | ERROR | stderr | [rank0]:     self._initialize_kv_caches()
2025-03-14 15:33:52 | ERROR | stderr | [rank0]:   File "/p/haicluster/llama/FastChat/sc_venv_sglang2/venv/lib/python3.11/site-packages/vllm/engine/llm_engine.py", line 426, in _initialize_kv_caches
2025-03-14 15:33:52 | ERROR | stderr | [rank0]:     self.model_executor.determine_num_available_blocks())
2025-03-14 15:33:52 | ERROR | stderr | [rank0]:     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-03-14 15:33:52 | ERROR | stderr | [rank0]:   File "/p/haicluster/llama/FastChat/sc_venv_sglang2/venv/lib/python3.11/site-packages/vllm/executor/executor_base.py", line 102, in determine_num_available_blocks
2025-03-14 15:33:52 | ERROR | stderr | [rank0]:     results = self.collective_rpc("determine_num_available_blocks")
2025-03-14 15:33:52 | ERROR | stderr | [rank0]:               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-03-14 15:33:52 | ERROR | stderr | [rank0]:   File "/p/haicluster/llama/FastChat/sc_venv_sglang2/venv/lib/python3.11/site-packages/vllm/executor/executor_base.py", line 316, in collective_rpc
2025-03-14 15:33:52 | ERROR | stderr | [rank0]:     return self._run_workers(method, *args, **(kwargs or {}))
2025-03-14 15:33:52 | ERROR | stderr | [rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-03-14 15:33:52 | ERROR | stderr | [rank0]:   File "/p/haicluster/llama/FastChat/sc_venv_sglang2/venv/lib/python3.11/site-packages/vllm/executor/mp_distributed_executor.py", line 185, in _run_workers
2025-03-14 15:33:52 | ERROR | stderr | [rank0]:     driver_worker_output = run_method(self.driver_worker, sent_method,
2025-03-14 15:33:52 | ERROR | stderr | [rank0]:                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-03-14 15:33:52 | ERROR | stderr | [rank0]:   File "/p/haicluster/llama/FastChat/sc_venv_sglang2/venv/lib/python3.11/site-packages/vllm/utils.py", line 2238, in run_method
2025-03-14 15:33:52 | ERROR | stderr | [rank0]:     return func(*args, **kwargs)
2025-03-14 15:33:52 | ERROR | stderr | [rank0]:            ^^^^^^^^^^^^^^^^^^^^^
2025-03-14 15:33:52 | ERROR | stderr | [rank0]:   File "/p/haicluster/llama/FastChat/sc_venv_sglang2/venv/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
2025-03-14 15:33:52 | ERROR | stderr | [rank0]:     return func(*args, **kwargs)
2025-03-14 15:33:52 | ERROR | stderr | [rank0]:            ^^^^^^^^^^^^^^^^^^^^^
2025-03-14 15:33:52 | ERROR | stderr | [rank0]:   File "/p/haicluster/llama/FastChat/sc_venv_sglang2/venv/lib/python3.11/site-packages/vllm/worker/worker.py", line 229, in determine_num_available_blocks
2025-03-14 15:33:52 | ERROR | stderr | [rank0]:     self.model_runner.profile_run()
2025-03-14 15:33:52 | ERROR | stderr | [rank0]:   File "/p/haicluster/llama/FastChat/sc_venv_sglang2/venv/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
2025-03-14 15:33:52 | ERROR | stderr | [rank0]:     return func(*args, **kwargs)
2025-03-14 15:33:52 | ERROR | stderr | [rank0]:            ^^^^^^^^^^^^^^^^^^^^^
2025-03-14 15:33:52 | ERROR | stderr | [rank0]:   File "/p/haicluster/llama/FastChat/sc_venv_sglang2/venv/lib/python3.11/site-packages/vllm/worker/model_runner.py", line 1243, in profile_run
2025-03-14 15:33:52 | ERROR | stderr | [rank0]:     self._dummy_run(max_num_batched_tokens, max_num_seqs)
2025-03-14 15:33:52 | ERROR | stderr | [rank0]:   File "/p/haicluster/llama/FastChat/sc_venv_sglang2/venv/lib/python3.11/site-packages/vllm/worker/model_runner.py", line 1354, in _dummy_run
2025-03-14 15:33:52 | ERROR | stderr | [rank0]:     self.execute_model(model_input, kv_caches, intermediate_tensors)
2025-03-14 15:33:52 | ERROR | stderr | [rank0]:   File "/p/haicluster/llama/FastChat/sc_venv_sglang2/venv/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
2025-03-14 15:33:52 | ERROR | stderr | [rank0]:     return func(*args, **kwargs)
2025-03-14 15:33:52 | ERROR | stderr | [rank0]:            ^^^^^^^^^^^^^^^^^^^^^
2025-03-14 15:33:52 | ERROR | stderr | [rank0]:   File "/p/haicluster/llama/FastChat/sc_venv_sglang2/venv/lib/python3.11/site-packages/vllm/worker/model_runner.py", line 1742, in execute_model
2025-03-14 15:33:52 | ERROR | stderr | [rank0]:     hidden_or_intermediate_states = model_executable(
2025-03-14 15:33:52 | ERROR | stderr | [rank0]:                                     ^^^^^^^^^^^^^^^^^
2025-03-14 15:33:52 | ERROR | stderr | [rank0]:   File "/p/haicluster/llama/FastChat/sc_venv_sglang2/venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
2025-03-14 15:33:52 | ERROR | stderr | [rank0]:     return self._call_impl(*args, **kwargs)
2025-03-14 15:33:52 | ERROR | stderr | [rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-03-14 15:33:52 | ERROR | stderr | [rank0]:   File "/p/haicluster/llama/FastChat/sc_venv_sglang2/venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
2025-03-14 15:33:52 | ERROR | stderr | [rank0]:     return forward_call(*args, **kwargs)
2025-03-14 15:33:52 | ERROR | stderr | [rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-03-14 15:33:52 | ERROR | stderr | [rank0]:   File "/p/haicluster/llama/FastChat/sc_venv_sglang2/venv/lib/python3.11/site-packages/vllm/model_executor/models/gemma3_mm.py", line 519, in forward
2025-03-14 15:33:52 | ERROR | stderr | [rank0]:     vision_embeddings = self.get_multimodal_embeddings(**kwargs)
2025-03-14 15:33:52 | ERROR | stderr | [rank0]:                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-03-14 15:33:52 | ERROR | stderr | [rank0]:   File "/p/haicluster/llama/FastChat/sc_venv_sglang2/venv/lib/python3.11/site-packages/vllm/model_executor/models/gemma3_mm.py", line 490, in get_multimodal_embeddings
2025-03-14 15:33:52 | ERROR | stderr | [rank0]:     vision_embeddings = self._process_image_input(image_input)
2025-03-14 15:33:52 | ERROR | stderr | [rank0]:                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-03-14 15:33:52 | ERROR | stderr | [rank0]:   File "/p/haicluster/llama/FastChat/sc_venv_sglang2/venv/lib/python3.11/site-packages/vllm/model_executor/models/gemma3_mm.py", line 479, in _process_image_input
2025-03-14 15:33:52 | ERROR | stderr | [rank0]:     vision_outputs = self._image_pixels_to_features(
2025-03-14 15:33:52 | ERROR | stderr | [rank0]:                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-03-14 15:33:52 | ERROR | stderr | [rank0]:   File "/p/haicluster/llama/FastChat/sc_venv_sglang2/venv/lib/python3.11/site-packages/vllm/model_executor/models/gemma3_mm.py", line 469, in _image_pixels_to_features
2025-03-14 15:33:52 | ERROR | stderr | [rank0]:     image_features = vision_tower(pixel_values.to(dtype=target_dtype))
2025-03-14 15:33:52 | ERROR | stderr | [rank0]:                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-03-14 15:33:52 | ERROR | stderr | [rank0]:   File "/p/haicluster/llama/FastChat/sc_venv_sglang2/venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
2025-03-14 15:33:52 | ERROR | stderr | [rank0]:     return self._call_impl(*args, **kwargs)
2025-03-14 15:33:52 | ERROR | stderr | [rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-03-14 15:33:52 | ERROR | stderr | [rank0]:   File "/p/haicluster/llama/FastChat/sc_venv_sglang2/venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
2025-03-14 15:33:52 | ERROR | stderr | [rank0]:     return forward_call(*args, **kwargs)
2025-03-14 15:33:52 | ERROR | stderr | [rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-03-14 15:33:52 | ERROR | stderr | [rank0]:   File "/p/haicluster/llama/FastChat/sc_venv_sglang2/venv/lib/python3.11/site-packages/vllm/model_executor/models/siglip.py", line 478, in forward
2025-03-14 15:33:52 | ERROR | stderr | [rank0]:     return self.vision_model(
2025-03-14 15:33:52 | ERROR | stderr | [rank0]:            ^^^^^^^^^^^^^^^^^^
2025-03-14 15:33:52 | ERROR | stderr | [rank0]:   File "/p/haicluster/llama/FastChat/sc_venv_sglang2/venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
2025-03-14 15:33:52 | ERROR | stderr | [rank0]:     return self._call_impl(*args, **kwargs)
2025-03-14 15:33:52 | ERROR | stderr | [rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-03-14 15:33:52 | ERROR | stderr | [rank0]:   File "/p/haicluster/llama/FastChat/sc_venv_sglang2/venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
2025-03-14 15:33:52 | ERROR | stderr | [rank0]:     return forward_call(*args, **kwargs)
2025-03-14 15:33:52 | ERROR | stderr | [rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-03-14 15:33:52 | ERROR | stderr | [rank0]:   File "/p/haicluster/llama/FastChat/sc_venv_sglang2/venv/lib/python3.11/site-packages/vllm/model_executor/models/siglip.py", line 429, in forward
2025-03-14 15:33:52 | ERROR | stderr | [rank0]:     encoder_outputs = self.encoder(
2025-03-14 15:33:52 | ERROR | stderr | [rank0]:                       ^^^^^^^^^^^^^
2025-03-14 15:33:52 | ERROR | stderr | [rank0]:   File "/p/haicluster/llama/FastChat/sc_venv_sglang2/venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
2025-03-14 15:33:52 | ERROR | stderr | [rank0]:     return self._call_impl(*args, **kwargs)
2025-03-14 15:33:52 | ERROR | stderr | [rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-03-14 15:33:52 | ERROR | stderr | [rank0]:   File "/p/haicluster/llama/FastChat/sc_venv_sglang2/venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
2025-03-14 15:33:52 | ERROR | stderr | [rank0]:     return forward_call(*args, **kwargs)
2025-03-14 15:33:52 | ERROR | stderr | [rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-03-14 15:33:52 | ERROR | stderr | [rank0]:   File "/p/haicluster/llama/FastChat/sc_venv_sglang2/venv/lib/python3.11/site-packages/vllm/model_executor/models/siglip.py", line 318, in forward
2025-03-14 15:33:52 | ERROR | stderr | [rank0]:     hidden_states, _ = encoder_layer(hidden_states)
2025-03-14 15:33:52 | ERROR | stderr | [rank0]:                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-03-14 15:33:52 | ERROR | stderr | [rank0]:   File "/p/haicluster/llama/FastChat/sc_venv_sglang2/venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
2025-03-14 15:33:52 | ERROR | stderr | [rank0]:     return self._call_impl(*args, **kwargs)
2025-03-14 15:33:52 | ERROR | stderr | [rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-03-14 15:33:52 | ERROR | stderr | [rank0]:   File "/p/haicluster/llama/FastChat/sc_venv_sglang2/venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
2025-03-14 15:33:52 | ERROR | stderr | [rank0]:     return forward_call(*args, **kwargs)
2025-03-14 15:33:52 | ERROR | stderr | [rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-03-14 15:33:52 | ERROR | stderr | [rank0]:   File "/p/haicluster/llama/FastChat/sc_venv_sglang2/venv/lib/python3.11/site-packages/vllm/model_executor/models/siglip.py", line 273, in forward
2025-03-14 15:33:52 | ERROR | stderr | [rank0]:     hidden_states, _ = self.self_attn(hidden_states=hidden_states)
2025-03-14 15:33:52 | ERROR | stderr | [rank0]:                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-03-14 15:33:52 | ERROR | stderr | [rank0]:   File "/p/haicluster/llama/FastChat/sc_venv_sglang2/venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
2025-03-14 15:33:52 | ERROR | stderr | [rank0]:     return self._call_impl(*args, **kwargs)
2025-03-14 15:33:52 | ERROR | stderr | [rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-03-14 15:33:52 | ERROR | stderr | [rank0]:   File "/p/haicluster/llama/FastChat/sc_venv_sglang2/venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
2025-03-14 15:33:52 | ERROR | stderr | [rank0]:     return forward_call(*args, **kwargs)
2025-03-14 15:33:52 | ERROR | stderr | [rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-03-14 15:33:52 | ERROR | stderr | [rank0]:   File "/p/haicluster/llama/FastChat/sc_venv_sglang2/venv/lib/python3.11/site-packages/vllm/model_executor/models/siglip.py", line 190, in forward
2025-03-14 15:33:52 | ERROR | stderr | [rank0]:     qkv_states, _ = self.qkv_proj(hidden_states)
2025-03-14 15:33:52 | ERROR | stderr | [rank0]:                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-03-14 15:33:52 | ERROR | stderr | [rank0]:   File "/p/haicluster/llama/FastChat/sc_venv_sglang2/venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
2025-03-14 15:33:52 | ERROR | stderr | [rank0]:     return self._call_impl(*args, **kwargs)
2025-03-14 15:33:52 | ERROR | stderr | [rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-03-14 15:33:52 | ERROR | stderr | [rank0]:   File "/p/haicluster/llama/FastChat/sc_venv_sglang2/venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
2025-03-14 15:33:52 | ERROR | stderr | [rank0]:     return forward_call(*args, **kwargs)
2025-03-14 15:33:52 | ERROR | stderr | [rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-03-14 15:33:52 | ERROR | stderr | [rank0]:   File "/p/haicluster/llama/FastChat/sc_venv_sglang2/venv/lib/python3.11/site-packages/vllm/model_executor/layers/linear.py", line 474, in forward
2025-03-14 15:33:52 | ERROR | stderr | [rank0]:     output_parallel = self.quant_method.apply(self, input_, bias)
2025-03-14 15:33:52 | ERROR | stderr | [rank0]:                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-03-14 15:33:52 | ERROR | stderr | [rank0]:   File "/p/haicluster/llama/FastChat/sc_venv_sglang2/venv/lib/python3.11/site-packages/vllm/model_executor/layers/linear.py", line 191, in apply
2025-03-14 15:33:52 | ERROR | stderr | [rank0]:     return F.linear(x, layer.weight, bias)
2025-03-14 15:33:52 | ERROR | stderr | [rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-03-14 15:33:52 | ERROR | stderr | [rank0]: RuntimeError: CUDA error: CUBLAS_STATUS_NOT_SUPPORTED when calling `cublasGemmEx( handle, opa, opb, m, n, k, &falpha, a, CUDA_R_16BF, lda, b, CUDA_R_16BF, ldb, &fbeta, c, CUDA_R_16BF, ldc, compute_type, CUBLAS_GEMM_DEFAULT_TENSOR_OP)`
ERROR 03-14 15:33:53 [multiproc_worker_utils.py:124] Worker VllmWorkerProcess pid 1386109 died, exit code: -15
INFO 03-14 15:33:53 [multiproc_worker_utils.py:128] Killing local vLLM worker processes
Your need to confirm your account before you can post a new comment.

Sign up or log in to comment