VRAM usage
#34
by
Yier48
- opened
When encoding long texts (tens of thousands of tokens) with a model, the required VRAM is too large—even a 48GB GPU is insufficient.
I would like to reduce VRAM usage by quantizing the model. Do you have any recommended methods?
Or are there other ways to reduce VRAM consumption?
Thanks!