Load model into TGI

#27
by schauppi - opened

Hello - Thx for the great work!

I want to load this model in the text generation inference v0.9.3 (latest) with 2x3090 24gb vRAM each. In this GitHub thread: https://github.com/huggingface/text-generation-inference they said it is not possible/will not fit?

You mentioned here https://huggingface.co/TheBloke/Llama-2-70B-chat-GPTQ/discussions/2#64ba51be41078fd9a059c1a6 that it would be possible.

Please could you guide me in the right direction running this model in TGI with my setup?

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment