Post
8
We thought it would be easier, but finally we have integrated CuDNN Paged Attention to our models!
Read article here: https://app.thestage.ai/blog/Integrating-cuDNN-Paged-Attention-to-TheStage-AI-Inference?id=8
Llama-8B with CuDNN paged attention, including B200 support: TheStageAI/Elastic-Llama-3.1-8B-Instruct
Mistral-Small-24B with CuDNN paged attention, including B200 support: TheStageAI/Elastic-Mistral-Small-3.1-24B-Instruct-2503
Read article here: https://app.thestage.ai/blog/Integrating-cuDNN-Paged-Attention-to-TheStage-AI-Inference?id=8
Llama-8B with CuDNN paged attention, including B200 support: TheStageAI/Elastic-Llama-3.1-8B-Instruct
Mistral-Small-24B with CuDNN paged attention, including B200 support: TheStageAI/Elastic-Mistral-Small-3.1-24B-Instruct-2503