Update README.md
Browse files
README.md
CHANGED
@@ -17,7 +17,7 @@ tags:
|
|
17 |
- conversational
|
18 |
---
|
19 |
|
20 |
-
This repository hosts the **Phi4-mini-instruct** model quantized with [torchao](https://huggingface.co/docs/transformers/main/en/quantization/torchao) using int4 weight-only quantization and the [hqq](https://mobiusml.github.io/hqq_blog/) algorithm. This work is brought to you by the PyTorch team. This model can be used directly or served using [vLLM](https://docs.vllm.ai/en/latest/) for
|
21 |
|
22 |
---
|
23 |
|
|
|
17 |
- conversational
|
18 |
---
|
19 |
|
20 |
+
This repository hosts the **Phi4-mini-instruct** model quantized with [torchao](https://huggingface.co/docs/transformers/main/en/quantization/torchao) using int4 weight-only quantization and the [hqq](https://mobiusml.github.io/hqq_blog/) algorithm. This work is brought to you by the PyTorch team. This model can be used directly or served using [vLLM](https://docs.vllm.ai/en/latest/) for 67% VRAM reduction (2.98 GB needed) and speedup on A100 GPUs.
|
21 |
|
22 |
---
|
23 |
|