Qwen2.5-VL-7B-Instruct-W4A16-G128 / README.md

jeffcookio

Create README.md

44ccdde verified 5 months ago

preview code

raw

history blame contribute delete

237 Bytes

Just a quick quantization of Qwen2.5-VL-7B-Instruct using llm-compressor. Used the example script with a MAX_SEQUENCE of 32768, truncation disabled (hit a bug in VLLM with this in the tokenizer), and NUM_CALIBRATION_SAMPLES of 512.