noxneural
/

Hermes-3-Llama-3.2-3B-awq-4-bit

Model card Files Files and versions Community

noxneural commited on Feb 8

Commit

a52d994

·

verified ·

1 Parent(s): 33d7428

Update README.md

Files changed (1) hide show

README.md +0 -8

README.md CHANGED Viewed

@@ -45,14 +45,6 @@ This is a 4-bit AWQ (Activation-aware Weight Quantization) quantized version of
 For details on the original model, please see the [**Hermes 3 Technical Report**](https://arxiv.org/abs/2408.11857).
-### What is AWQ 4-bit Quantization?
-AWQ (Activation-aware Weight Quantization) is a quantization technique designed to optimize large language models for efficient inference while minimizing performance loss. The **4-bit AWQ** version of this model:
-- **Reduces memory footprint**, enabling deployment on lower-end hardware (e.g., consumer GPUs and edge devices).
-- **Speeds up inference**, making response times faster while maintaining accuracy.
-- **Preserves performance**, as AWQ selectively quantizes weights based on activation sensitivity, ensuring minimal loss in capability.
 ## Base Model Information
 Hermes 3 3B is a generalist language model fine-tuned from **Llama-3.2 3B**, with improvements in:

 For details on the original model, please see the [**Hermes 3 Technical Report**](https://arxiv.org/abs/2408.11857).
 ## Base Model Information
 Hermes 3 3B is a generalist language model fine-tuned from **Llama-3.2 3B**, with improvements in: