Xin Lai commited on
Commit
a94cd3e
·
1 Parent(s): c48aae5
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -57,7 +57,7 @@ To use `bf16` or `fp16` data type for inference:
57
  ```
58
  CUDA_VISIBLE_DEVICES=0 python3 chat.py --version='xinlai/LISA-13B-llama2-v0' --precision='bf16'
59
  ```
60
- To use `8bit` or `4bit` data type for inference:
61
  ```
62
  CUDA_VISIBLE_DEVICES=0 python3 chat.py --version='xinlai/LISA-13B-llama2-v0' --precision='fp16' --load_in_8bit
63
  CUDA_VISIBLE_DEVICES=0 python3 chat.py --version='xinlai/LISA-13B-llama2-v0' --precision='fp16' --load_in_4bit
 
57
  ```
58
  CUDA_VISIBLE_DEVICES=0 python3 chat.py --version='xinlai/LISA-13B-llama2-v0' --precision='bf16'
59
  ```
60
+ To use `8bit` or `4bit` data type for inference (this enables running 13B model on a single 24G or 12G GPU at some cost of generation quality):
61
  ```
62
  CUDA_VISIBLE_DEVICES=0 python3 chat.py --version='xinlai/LISA-13B-llama2-v0' --precision='fp16' --load_in_8bit
63
  CUDA_VISIBLE_DEVICES=0 python3 chat.py --version='xinlai/LISA-13B-llama2-v0' --precision='fp16' --load_in_4bit