jarvisvasu
/

Qwen2.5-VL-3B-Instruct-4bit

Image-Text-to-Text

text-generation-inference

4-bit precision

Model card Files Files and versions Community

jarvisvasu commited on Jan 29

Commit

71ef007

·

verified ·

1 Parent(s): 48dae2d

load_in_4bit

load_in_4bit=True

Files changed (1) hide show

README.md +4 -3

README.md CHANGED Viewed

@@ -126,7 +126,7 @@ We offer a toolkit to help you handle various types of visual input more conveni
 ```bash
 # It's highly recommanded to use `[decord]` feature for faster video loading.
-pip install qwen-vl-utils[decord]==0.0.8
 ```
 If you are not using Linux, you might not be able to install `decord` from PyPI. In that case, you can use `pip install qwen-vl-utils` which will fall back to using torchvision for video processing. However, you can still [install decord from source](https://github.com/dmlc/decord?tab=readme-ov-file#install-from-source) to get decord used when loading video.
@@ -141,15 +141,16 @@ from qwen_vl_utils import process_vision_info
 # default: Load the model on the available device(s)
 model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
-    "Qwen/Qwen2.5-VL-3B-Instruct", torch_dtype="auto", device_map="auto"
 )
 # We recommend enabling flash_attention_2 for better acceleration and memory saving, especially in multi-image and video scenarios.
 # model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
-#     "Qwen/Qwen2.5-VL-3B-Instruct",
 #     torch_dtype=torch.bfloat16,
 #     attn_implementation="flash_attention_2",
 #     device_map="auto",
 # )
 # default processer

 ```bash
 # It's highly recommanded to use `[decord]` feature for faster video loading.
+pip install qwen-vl-utils[decord]==0.0.8 bitsandbytes
 ```
 If you are not using Linux, you might not be able to install `decord` from PyPI. In that case, you can use `pip install qwen-vl-utils` which will fall back to using torchvision for video processing. However, you can still [install decord from source](https://github.com/dmlc/decord?tab=readme-ov-file#install-from-source) to get decord used when loading video.
 # default: Load the model on the available device(s)
 model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
+    "jarvisvasu/Qwen2.5-VL-3B-Instruct-4bit", torch_dtype="auto", device_map="auto", load_in_4bit=True
 )
 # We recommend enabling flash_attention_2 for better acceleration and memory saving, especially in multi-image and video scenarios.
 # model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
+#     "jarvisvasu/Qwen2.5-VL-3B-Instruct-4bit",
 #     torch_dtype=torch.bfloat16,
 #     attn_implementation="flash_attention_2",
 #     device_map="auto",
+#     load_in_4bit=True,
 # )
 # default processer