Spaces:
Paused
Paused
Xin Lai
commited on
Commit
·
a94cd3e
1
Parent(s):
c48aae5
Update README.md
Browse filesFormer-commit-id: 43231082c353de3c4ad680dac08b5a482bf87dae
README.md
CHANGED
@@ -57,7 +57,7 @@ To use `bf16` or `fp16` data type for inference:
|
|
57 |
```
|
58 |
CUDA_VISIBLE_DEVICES=0 python3 chat.py --version='xinlai/LISA-13B-llama2-v0' --precision='bf16'
|
59 |
```
|
60 |
-
To use `8bit` or `4bit` data type for inference:
|
61 |
```
|
62 |
CUDA_VISIBLE_DEVICES=0 python3 chat.py --version='xinlai/LISA-13B-llama2-v0' --precision='fp16' --load_in_8bit
|
63 |
CUDA_VISIBLE_DEVICES=0 python3 chat.py --version='xinlai/LISA-13B-llama2-v0' --precision='fp16' --load_in_4bit
|
|
|
57 |
```
|
58 |
CUDA_VISIBLE_DEVICES=0 python3 chat.py --version='xinlai/LISA-13B-llama2-v0' --precision='bf16'
|
59 |
```
|
60 |
+
To use `8bit` or `4bit` data type for inference (this enables running 13B model on a single 24G or 12G GPU at some cost of generation quality):
|
61 |
```
|
62 |
CUDA_VISIBLE_DEVICES=0 python3 chat.py --version='xinlai/LISA-13B-llama2-v0' --precision='fp16' --load_in_8bit
|
63 |
CUDA_VISIBLE_DEVICES=0 python3 chat.py --version='xinlai/LISA-13B-llama2-v0' --precision='fp16' --load_in_4bit
|