Update README.md

Files changed (1) hide show

README.md CHANGED Viewed

@@ -17,7 +17,7 @@ tags:
 - conversational
 ---
-This repository hosts the **Phi4-mini-instruct** model quantized with [torchao](https://huggingface.co/docs/transformers/main/en/quantization/torchao) using int4 weight-only quantization and the [hqq](https://mobiusml.github.io/hqq_blog/) algorithm. This work is brought to you by the PyTorch team. This model can be used directly or served using [vLLM](https://docs.vllm.ai/en/latest/) for significant VRAM reduction and speedup on A100 GPUs.
 ---

 - conversational
 ---
+This repository hosts the **Phi4-mini-instruct** model quantized with [torchao](https://huggingface.co/docs/transformers/main/en/quantization/torchao) using int4 weight-only quantization and the [hqq](https://mobiusml.github.io/hqq_blog/) algorithm. This work is brought to you by the PyTorch team. This model can be used directly or served using [vLLM](https://docs.vllm.ai/en/latest/) for 67% VRAM reduction (2.98 GB needed) and speedup on A100 GPUs.
 ---