memba2-hybrid model inference speed vs Transformer-vLLM solutions
#3
by
LarryLi
- opened
How about the real token generation speed between mamba2-hybrid model vs classical Transformer based models. Transformer models can make use of vLLM to speed up. Does mamba2-hybrid models can speed up with it?