How to run on a RTX 5090 / Blackwell ?
I’ve been struggling to get it running on an RTX 5090 with 32 GB of RAM. The official Docker images from Tencent don’t seem to be compatible with the Blackwell architecture. I even tried building vLLM from source via git clone, but no luck either.
Any hints?
Hi Celsown,
We've update the vLLM docker to cuda 12.4 + vLLM Official docker base image , could you check the compatible with Blackwell ?
But from your message, you have tried source code build, maybe this will not work too.
What's your error message, could you paste the full error log here ?
for a 32GB VRAM 5090, the VRAM too small to run a 80GB model even with int4 quantization.
I'll give it a try on the RTX PRO 6000 Blackwell, 96GB of RAM, but the docker image shown on the model page seems to use CUDA 12.4, which won't work with Blackwell. If I remember correctly, blackwell chip requires cuda with sm_120 capability.
I'll give it a try on the RTX PRO 6000 Blackwell, 96GB of RAM, but the docker image shown on the model page seems to use CUDA 12.4, which won't work with Blackwell. If I remember correctly, blackwell chip requires cuda with sm_120 capability.
For now, the latest vllm already merge the model support patch, you can use vllm openai docker:
vllm/vllm-openai:latest
This docker have Hunyuan-A13B-Instruct model support and cuda 12.8 support.
@aaron-newsome