Nice model, any info on scripts used to quantize?

#1
by RonanMcGovern - opened

and also commands for running with vLLM? Thanks

Just pass the stub to vLLM and it will run.

For the scripts, we have a bunch of examples in the vllm-project/llm-compressor repo for fp8. Just swap in the Llama 3.3 HF stub and youre good to go.

mgoin changed discussion status to closed
Your need to confirm your account before you can post a new comment.

Sign up or log in to comment