load_in_4bit
Browse filesload_in_4bit=True
README.md
CHANGED
@@ -126,7 +126,7 @@ We offer a toolkit to help you handle various types of visual input more conveni
|
|
126 |
|
127 |
```bash
|
128 |
# It's highly recommanded to use `[decord]` feature for faster video loading.
|
129 |
-
pip install qwen-vl-utils[decord]==0.0.8
|
130 |
```
|
131 |
|
132 |
If you are not using Linux, you might not be able to install `decord` from PyPI. In that case, you can use `pip install qwen-vl-utils` which will fall back to using torchvision for video processing. However, you can still [install decord from source](https://github.com/dmlc/decord?tab=readme-ov-file#install-from-source) to get decord used when loading video.
|
@@ -141,15 +141,16 @@ from qwen_vl_utils import process_vision_info
|
|
141 |
|
142 |
# default: Load the model on the available device(s)
|
143 |
model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
|
144 |
-
"
|
145 |
)
|
146 |
|
147 |
# We recommend enabling flash_attention_2 for better acceleration and memory saving, especially in multi-image and video scenarios.
|
148 |
# model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
|
149 |
-
# "
|
150 |
# torch_dtype=torch.bfloat16,
|
151 |
# attn_implementation="flash_attention_2",
|
152 |
# device_map="auto",
|
|
|
153 |
# )
|
154 |
|
155 |
# default processer
|
|
|
126 |
|
127 |
```bash
|
128 |
# It's highly recommanded to use `[decord]` feature for faster video loading.
|
129 |
+
pip install qwen-vl-utils[decord]==0.0.8 bitsandbytes
|
130 |
```
|
131 |
|
132 |
If you are not using Linux, you might not be able to install `decord` from PyPI. In that case, you can use `pip install qwen-vl-utils` which will fall back to using torchvision for video processing. However, you can still [install decord from source](https://github.com/dmlc/decord?tab=readme-ov-file#install-from-source) to get decord used when loading video.
|
|
|
141 |
|
142 |
# default: Load the model on the available device(s)
|
143 |
model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
|
144 |
+
"jarvisvasu/Qwen2.5-VL-3B-Instruct-4bit", torch_dtype="auto", device_map="auto", load_in_4bit=True
|
145 |
)
|
146 |
|
147 |
# We recommend enabling flash_attention_2 for better acceleration and memory saving, especially in multi-image and video scenarios.
|
148 |
# model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
|
149 |
+
# "jarvisvasu/Qwen2.5-VL-3B-Instruct-4bit",
|
150 |
# torch_dtype=torch.bfloat16,
|
151 |
# attn_implementation="flash_attention_2",
|
152 |
# device_map="auto",
|
153 |
+
# load_in_4bit=True,
|
154 |
# )
|
155 |
|
156 |
# default processer
|