jeffcookio's picture
Create README.md
44ccdde verified

Just a quick quantization of Qwen2.5-VL-7B-Instruct using llm-compressor. Used the example script with a MAX_SEQUENCE of 32768, truncation disabled (hit a bug in VLLM with this in the tokenizer), and NUM_CALIBRATION_SAMPLES of 512.