Just a quick quantization of Qwen2.5-VL-7B-Instruct using llm-compressor
. Used the example script with a MAX_SEQUENCE
of 32768, truncation disabled (hit a bug in VLLM with this in the tokenizer), and NUM_CALIBRATION_SAMPLES
of 512.
Just a quick quantization of Qwen2.5-VL-7B-Instruct using llm-compressor
. Used the example script with a MAX_SEQUENCE
of 32768, truncation disabled (hit a bug in VLLM with this in the tokenizer), and NUM_CALIBRATION_SAMPLES
of 512.