CUDA error: CUBLAS_STATUS_NOT_SUPPORTED
#25
by
surak
- opened
I can run other models fine on the same venv, but this one gives me the following error:
RuntimeError: CUDA error: CUBLAS_STATUS_NOT_SUPPORTED when calling `cublasLtMatmulAlgoGetHeuristic( ltHandle, computeDesc.descriptor(), Adesc.descriptor(), Bdesc.descriptor(), Cdesc.descriptor(), Cdesc.descriptor(), preference.descriptor(), 1, &heuristicResult, &returnedResult)`
Full error on FastChat:
2025-03-14 15:33:52 | ERROR | stderr | [rank0]: Traceback (most recent call last):
2025-03-14 15:33:52 | ERROR | stderr | [rank0]: File "/p/haicluster/llama/FastChat/fastchat/serve/vllm_worker.py", line 290, in <module>
2025-03-14 15:33:52 | ERROR | stderr | [rank0]: engine = AsyncLLMEngine.from_engine_args(engine_args)
2025-03-14 15:33:52 | ERROR | stderr | [rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-03-14 15:33:52 | ERROR | stderr | [rank0]: File "/p/haicluster/llama/FastChat/sc_venv_sglang2/venv/lib/python3.11/site-packages/vllm/engine/async_llm_engine.py", line 649, in from_engine_args
2025-03-14 15:33:52 | ERROR | stderr | [rank0]: engine = cls(
2025-03-14 15:33:52 | ERROR | stderr | [rank0]: ^^^^
2025-03-14 15:33:52 | ERROR | stderr | [rank0]: File "/p/haicluster/llama/FastChat/sc_venv_sglang2/venv/lib/python3.11/site-packages/vllm/engine/async_llm_engine.py", line 599, in __init__
2025-03-14 15:33:52 | ERROR | stderr | [rank0]: self.engine = self._engine_class(*args, **kwargs)
2025-03-14 15:33:52 | ERROR | stderr | [rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-03-14 15:33:52 | ERROR | stderr | [rank0]: File "/p/haicluster/llama/FastChat/sc_venv_sglang2/venv/lib/python3.11/site-packages/vllm/engine/async_llm_engine.py", line 267, in __init__
2025-03-14 15:33:52 | ERROR | stderr | [rank0]: super().__init__(*args, **kwargs)
2025-03-14 15:33:52 | ERROR | stderr | [rank0]: File "/p/haicluster/llama/FastChat/sc_venv_sglang2/venv/lib/python3.11/site-packages/vllm/engine/llm_engine.py", line 277, in __init__
2025-03-14 15:33:52 | ERROR | stderr | [rank0]: self._initialize_kv_caches()
2025-03-14 15:33:52 | ERROR | stderr | [rank0]: File "/p/haicluster/llama/FastChat/sc_venv_sglang2/venv/lib/python3.11/site-packages/vllm/engine/llm_engine.py", line 426, in _initialize_kv_caches
2025-03-14 15:33:52 | ERROR | stderr | [rank0]: self.model_executor.determine_num_available_blocks())
2025-03-14 15:33:52 | ERROR | stderr | [rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-03-14 15:33:52 | ERROR | stderr | [rank0]: File "/p/haicluster/llama/FastChat/sc_venv_sglang2/venv/lib/python3.11/site-packages/vllm/executor/executor_base.py", line 102, in determine_num_available_blocks
2025-03-14 15:33:52 | ERROR | stderr | [rank0]: results = self.collective_rpc("determine_num_available_blocks")
2025-03-14 15:33:52 | ERROR | stderr | [rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-03-14 15:33:52 | ERROR | stderr | [rank0]: File "/p/haicluster/llama/FastChat/sc_venv_sglang2/venv/lib/python3.11/site-packages/vllm/executor/executor_base.py", line 316, in collective_rpc
2025-03-14 15:33:52 | ERROR | stderr | [rank0]: return self._run_workers(method, *args, **(kwargs or {}))
2025-03-14 15:33:52 | ERROR | stderr | [rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-03-14 15:33:52 | ERROR | stderr | [rank0]: File "/p/haicluster/llama/FastChat/sc_venv_sglang2/venv/lib/python3.11/site-packages/vllm/executor/mp_distributed_executor.py", line 185, in _run_workers
2025-03-14 15:33:52 | ERROR | stderr | [rank0]: driver_worker_output = run_method(self.driver_worker, sent_method,
2025-03-14 15:33:52 | ERROR | stderr | [rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-03-14 15:33:52 | ERROR | stderr | [rank0]: File "/p/haicluster/llama/FastChat/sc_venv_sglang2/venv/lib/python3.11/site-packages/vllm/utils.py", line 2238, in run_method
2025-03-14 15:33:52 | ERROR | stderr | [rank0]: return func(*args, **kwargs)
2025-03-14 15:33:52 | ERROR | stderr | [rank0]: ^^^^^^^^^^^^^^^^^^^^^
2025-03-14 15:33:52 | ERROR | stderr | [rank0]: File "/p/haicluster/llama/FastChat/sc_venv_sglang2/venv/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
2025-03-14 15:33:52 | ERROR | stderr | [rank0]: return func(*args, **kwargs)
2025-03-14 15:33:52 | ERROR | stderr | [rank0]: ^^^^^^^^^^^^^^^^^^^^^
2025-03-14 15:33:52 | ERROR | stderr | [rank0]: File "/p/haicluster/llama/FastChat/sc_venv_sglang2/venv/lib/python3.11/site-packages/vllm/worker/worker.py", line 229, in determine_num_available_blocks
2025-03-14 15:33:52 | ERROR | stderr | [rank0]: self.model_runner.profile_run()
2025-03-14 15:33:52 | ERROR | stderr | [rank0]: File "/p/haicluster/llama/FastChat/sc_venv_sglang2/venv/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
2025-03-14 15:33:52 | ERROR | stderr | [rank0]: return func(*args, **kwargs)
2025-03-14 15:33:52 | ERROR | stderr | [rank0]: ^^^^^^^^^^^^^^^^^^^^^
2025-03-14 15:33:52 | ERROR | stderr | [rank0]: File "/p/haicluster/llama/FastChat/sc_venv_sglang2/venv/lib/python3.11/site-packages/vllm/worker/model_runner.py", line 1243, in profile_run
2025-03-14 15:33:52 | ERROR | stderr | [rank0]: self._dummy_run(max_num_batched_tokens, max_num_seqs)
2025-03-14 15:33:52 | ERROR | stderr | [rank0]: File "/p/haicluster/llama/FastChat/sc_venv_sglang2/venv/lib/python3.11/site-packages/vllm/worker/model_runner.py", line 1354, in _dummy_run
2025-03-14 15:33:52 | ERROR | stderr | [rank0]: self.execute_model(model_input, kv_caches, intermediate_tensors)
2025-03-14 15:33:52 | ERROR | stderr | [rank0]: File "/p/haicluster/llama/FastChat/sc_venv_sglang2/venv/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
2025-03-14 15:33:52 | ERROR | stderr | [rank0]: return func(*args, **kwargs)
2025-03-14 15:33:52 | ERROR | stderr | [rank0]: ^^^^^^^^^^^^^^^^^^^^^
2025-03-14 15:33:52 | ERROR | stderr | [rank0]: File "/p/haicluster/llama/FastChat/sc_venv_sglang2/venv/lib/python3.11/site-packages/vllm/worker/model_runner.py", line 1742, in execute_model
2025-03-14 15:33:52 | ERROR | stderr | [rank0]: hidden_or_intermediate_states = model_executable(
2025-03-14 15:33:52 | ERROR | stderr | [rank0]: ^^^^^^^^^^^^^^^^^
2025-03-14 15:33:52 | ERROR | stderr | [rank0]: File "/p/haicluster/llama/FastChat/sc_venv_sglang2/venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
2025-03-14 15:33:52 | ERROR | stderr | [rank0]: return self._call_impl(*args, **kwargs)
2025-03-14 15:33:52 | ERROR | stderr | [rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-03-14 15:33:52 | ERROR | stderr | [rank0]: File "/p/haicluster/llama/FastChat/sc_venv_sglang2/venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
2025-03-14 15:33:52 | ERROR | stderr | [rank0]: return forward_call(*args, **kwargs)
2025-03-14 15:33:52 | ERROR | stderr | [rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-03-14 15:33:52 | ERROR | stderr | [rank0]: File "/p/haicluster/llama/FastChat/sc_venv_sglang2/venv/lib/python3.11/site-packages/vllm/model_executor/models/gemma3_mm.py", line 519, in forward
2025-03-14 15:33:52 | ERROR | stderr | [rank0]: vision_embeddings = self.get_multimodal_embeddings(**kwargs)
2025-03-14 15:33:52 | ERROR | stderr | [rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-03-14 15:33:52 | ERROR | stderr | [rank0]: File "/p/haicluster/llama/FastChat/sc_venv_sglang2/venv/lib/python3.11/site-packages/vllm/model_executor/models/gemma3_mm.py", line 490, in get_multimodal_embeddings
2025-03-14 15:33:52 | ERROR | stderr | [rank0]: vision_embeddings = self._process_image_input(image_input)
2025-03-14 15:33:52 | ERROR | stderr | [rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-03-14 15:33:52 | ERROR | stderr | [rank0]: File "/p/haicluster/llama/FastChat/sc_venv_sglang2/venv/lib/python3.11/site-packages/vllm/model_executor/models/gemma3_mm.py", line 479, in _process_image_input
2025-03-14 15:33:52 | ERROR | stderr | [rank0]: vision_outputs = self._image_pixels_to_features(
2025-03-14 15:33:52 | ERROR | stderr | [rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-03-14 15:33:52 | ERROR | stderr | [rank0]: File "/p/haicluster/llama/FastChat/sc_venv_sglang2/venv/lib/python3.11/site-packages/vllm/model_executor/models/gemma3_mm.py", line 469, in _image_pixels_to_features
2025-03-14 15:33:52 | ERROR | stderr | [rank0]: image_features = vision_tower(pixel_values.to(dtype=target_dtype))
2025-03-14 15:33:52 | ERROR | stderr | [rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-03-14 15:33:52 | ERROR | stderr | [rank0]: File "/p/haicluster/llama/FastChat/sc_venv_sglang2/venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
2025-03-14 15:33:52 | ERROR | stderr | [rank0]: return self._call_impl(*args, **kwargs)
2025-03-14 15:33:52 | ERROR | stderr | [rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-03-14 15:33:52 | ERROR | stderr | [rank0]: File "/p/haicluster/llama/FastChat/sc_venv_sglang2/venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
2025-03-14 15:33:52 | ERROR | stderr | [rank0]: return forward_call(*args, **kwargs)
2025-03-14 15:33:52 | ERROR | stderr | [rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-03-14 15:33:52 | ERROR | stderr | [rank0]: File "/p/haicluster/llama/FastChat/sc_venv_sglang2/venv/lib/python3.11/site-packages/vllm/model_executor/models/siglip.py", line 478, in forward
2025-03-14 15:33:52 | ERROR | stderr | [rank0]: return self.vision_model(
2025-03-14 15:33:52 | ERROR | stderr | [rank0]: ^^^^^^^^^^^^^^^^^^
2025-03-14 15:33:52 | ERROR | stderr | [rank0]: File "/p/haicluster/llama/FastChat/sc_venv_sglang2/venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
2025-03-14 15:33:52 | ERROR | stderr | [rank0]: return self._call_impl(*args, **kwargs)
2025-03-14 15:33:52 | ERROR | stderr | [rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-03-14 15:33:52 | ERROR | stderr | [rank0]: File "/p/haicluster/llama/FastChat/sc_venv_sglang2/venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
2025-03-14 15:33:52 | ERROR | stderr | [rank0]: return forward_call(*args, **kwargs)
2025-03-14 15:33:52 | ERROR | stderr | [rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-03-14 15:33:52 | ERROR | stderr | [rank0]: File "/p/haicluster/llama/FastChat/sc_venv_sglang2/venv/lib/python3.11/site-packages/vllm/model_executor/models/siglip.py", line 429, in forward
2025-03-14 15:33:52 | ERROR | stderr | [rank0]: encoder_outputs = self.encoder(
2025-03-14 15:33:52 | ERROR | stderr | [rank0]: ^^^^^^^^^^^^^
2025-03-14 15:33:52 | ERROR | stderr | [rank0]: File "/p/haicluster/llama/FastChat/sc_venv_sglang2/venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
2025-03-14 15:33:52 | ERROR | stderr | [rank0]: return self._call_impl(*args, **kwargs)
2025-03-14 15:33:52 | ERROR | stderr | [rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-03-14 15:33:52 | ERROR | stderr | [rank0]: File "/p/haicluster/llama/FastChat/sc_venv_sglang2/venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
2025-03-14 15:33:52 | ERROR | stderr | [rank0]: return forward_call(*args, **kwargs)
2025-03-14 15:33:52 | ERROR | stderr | [rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-03-14 15:33:52 | ERROR | stderr | [rank0]: File "/p/haicluster/llama/FastChat/sc_venv_sglang2/venv/lib/python3.11/site-packages/vllm/model_executor/models/siglip.py", line 318, in forward
2025-03-14 15:33:52 | ERROR | stderr | [rank0]: hidden_states, _ = encoder_layer(hidden_states)
2025-03-14 15:33:52 | ERROR | stderr | [rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-03-14 15:33:52 | ERROR | stderr | [rank0]: File "/p/haicluster/llama/FastChat/sc_venv_sglang2/venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
2025-03-14 15:33:52 | ERROR | stderr | [rank0]: return self._call_impl(*args, **kwargs)
2025-03-14 15:33:52 | ERROR | stderr | [rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-03-14 15:33:52 | ERROR | stderr | [rank0]: File "/p/haicluster/llama/FastChat/sc_venv_sglang2/venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
2025-03-14 15:33:52 | ERROR | stderr | [rank0]: return forward_call(*args, **kwargs)
2025-03-14 15:33:52 | ERROR | stderr | [rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-03-14 15:33:52 | ERROR | stderr | [rank0]: File "/p/haicluster/llama/FastChat/sc_venv_sglang2/venv/lib/python3.11/site-packages/vllm/model_executor/models/siglip.py", line 273, in forward
2025-03-14 15:33:52 | ERROR | stderr | [rank0]: hidden_states, _ = self.self_attn(hidden_states=hidden_states)
2025-03-14 15:33:52 | ERROR | stderr | [rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-03-14 15:33:52 | ERROR | stderr | [rank0]: File "/p/haicluster/llama/FastChat/sc_venv_sglang2/venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
2025-03-14 15:33:52 | ERROR | stderr | [rank0]: return self._call_impl(*args, **kwargs)
2025-03-14 15:33:52 | ERROR | stderr | [rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-03-14 15:33:52 | ERROR | stderr | [rank0]: File "/p/haicluster/llama/FastChat/sc_venv_sglang2/venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
2025-03-14 15:33:52 | ERROR | stderr | [rank0]: return forward_call(*args, **kwargs)
2025-03-14 15:33:52 | ERROR | stderr | [rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-03-14 15:33:52 | ERROR | stderr | [rank0]: File "/p/haicluster/llama/FastChat/sc_venv_sglang2/venv/lib/python3.11/site-packages/vllm/model_executor/models/siglip.py", line 190, in forward
2025-03-14 15:33:52 | ERROR | stderr | [rank0]: qkv_states, _ = self.qkv_proj(hidden_states)
2025-03-14 15:33:52 | ERROR | stderr | [rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-03-14 15:33:52 | ERROR | stderr | [rank0]: File "/p/haicluster/llama/FastChat/sc_venv_sglang2/venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
2025-03-14 15:33:52 | ERROR | stderr | [rank0]: return self._call_impl(*args, **kwargs)
2025-03-14 15:33:52 | ERROR | stderr | [rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-03-14 15:33:52 | ERROR | stderr | [rank0]: File "/p/haicluster/llama/FastChat/sc_venv_sglang2/venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
2025-03-14 15:33:52 | ERROR | stderr | [rank0]: return forward_call(*args, **kwargs)
2025-03-14 15:33:52 | ERROR | stderr | [rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-03-14 15:33:52 | ERROR | stderr | [rank0]: File "/p/haicluster/llama/FastChat/sc_venv_sglang2/venv/lib/python3.11/site-packages/vllm/model_executor/layers/linear.py", line 474, in forward
2025-03-14 15:33:52 | ERROR | stderr | [rank0]: output_parallel = self.quant_method.apply(self, input_, bias)
2025-03-14 15:33:52 | ERROR | stderr | [rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-03-14 15:33:52 | ERROR | stderr | [rank0]: File "/p/haicluster/llama/FastChat/sc_venv_sglang2/venv/lib/python3.11/site-packages/vllm/model_executor/layers/linear.py", line 191, in apply
2025-03-14 15:33:52 | ERROR | stderr | [rank0]: return F.linear(x, layer.weight, bias)
2025-03-14 15:33:52 | ERROR | stderr | [rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-03-14 15:33:52 | ERROR | stderr | [rank0]: RuntimeError: CUDA error: CUBLAS_STATUS_NOT_SUPPORTED when calling `cublasGemmEx( handle, opa, opb, m, n, k, &falpha, a, CUDA_R_16BF, lda, b, CUDA_R_16BF, ldb, &fbeta, c, CUDA_R_16BF, ldc, compute_type, CUBLAS_GEMM_DEFAULT_TENSOR_OP)`
ERROR 03-14 15:33:53 [multiproc_worker_utils.py:124] Worker VllmWorkerProcess pid 1386109 died, exit code: -15
INFO 03-14 15:33:53 [multiproc_worker_utils.py:128] Killing local vLLM worker processes