How to run on a RTX 5090 / Blackwell ?

#27
by celsowm - opened

I’ve been struggling to get it running on an RTX 5090 with 32 GB of RAM. The official Docker images from Tencent don’t seem to be compatible with the Blackwell architecture. I even tried building vLLM from source via git clone, but no luck either.

Any hints?

Tencent org

Hi Celsown,

We've update the vLLM docker to cuda 12.4 + vLLM Official docker base image , could you check the compatible with Blackwell ?

But from your message, you have tried source code build, maybe this will not work too.

What's your error message, could you paste the full error log here ?

Tencent org

for a 32GB VRAM 5090, the VRAM too small to run a 80GB model even with int4 quantization.

I'll give it a try on the RTX PRO 6000 Blackwell, 96GB of RAM, but the docker image shown on the model page seems to use CUDA 12.4, which won't work with Blackwell. If I remember correctly, blackwell chip requires cuda with sm_120 capability.

I'll give it a try on the RTX PRO 6000 Blackwell, 96GB of RAM, but the docker image shown on the model page seems to use CUDA 12.4, which won't work with Blackwell. If I remember correctly, blackwell chip requires cuda with sm_120 capability.

For now, the latest vllm already merge the model support patch, you can use vllm openai docker:

vllm/vllm-openai:latest 

This docker have Hunyuan-A13B-Instruct model support and cuda 12.8 support.
@aaron-newsome

Sign up or log in to comment