Spaces:

aletrn
/

lisa-on-cuda

Paused

Xin Lai commited on Aug 3, 2023

Commit

a94cd3e

1 Parent(s): c48aae5

Update README.md

Former-commit-id: 43231082c353de3c4ad680dac08b5a482bf87dae

Files changed (1) hide show

README.md CHANGED Viewed

@@ -57,7 +57,7 @@ To use `bf16` or `fp16` data type for inference:
 ```
 CUDA_VISIBLE_DEVICES=0 python3 chat.py --version='xinlai/LISA-13B-llama2-v0' --precision='bf16'
 ```
-To use `8bit` or `4bit` data type for inference:
 ```
 CUDA_VISIBLE_DEVICES=0 python3 chat.py --version='xinlai/LISA-13B-llama2-v0' --precision='fp16' --load_in_8bit
 CUDA_VISIBLE_DEVICES=0 python3 chat.py --version='xinlai/LISA-13B-llama2-v0' --precision='fp16' --load_in_4bit

 ```
 CUDA_VISIBLE_DEVICES=0 python3 chat.py --version='xinlai/LISA-13B-llama2-v0' --precision='bf16'
 ```
+To use `8bit` or `4bit` data type for inference (this enables running 13B model on a single 24G or 12G GPU at some cost of generation quality):
 ```
 CUDA_VISIBLE_DEVICES=0 python3 chat.py --version='xinlai/LISA-13B-llama2-v0' --precision='fp16' --load_in_8bit
 CUDA_VISIBLE_DEVICES=0 python3 chat.py --version='xinlai/LISA-13B-llama2-v0' --precision='fp16' --load_in_4bit