How to quantize the hunyuan model to fp8

#1
by hz094 - opened

Hi sir, Thank for the excellent work, I am curious about how you quantize the hunyuan model, may you show more details?

you need torch and llama.cpp; could try to convert the safetensors to gguf and test it first; simply execute: ggc t

Screenshot 2024-12-27 001107.png

Screenshot 2024-12-27 001148.png

actually, if you just want fp8, the updated node has a tool - tensor cutter, which will help you make your own fp8 scaled model (50% decreased in file size) in an easy way; you don't need llama.cpp or any extra dependency in that case

calcuis changed discussion status to closed
calcuis changed discussion status to open
Your need to confirm your account before you can post a new comment.

Sign up or log in to comment