noxneural commited on
Commit
a52d994
·
verified ·
1 Parent(s): 33d7428

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +0 -8
README.md CHANGED
@@ -45,14 +45,6 @@ This is a 4-bit AWQ (Activation-aware Weight Quantization) quantized version of
45
 
46
  For details on the original model, please see the [**Hermes 3 Technical Report**](https://arxiv.org/abs/2408.11857).
47
 
48
- ### What is AWQ 4-bit Quantization?
49
-
50
- AWQ (Activation-aware Weight Quantization) is a quantization technique designed to optimize large language models for efficient inference while minimizing performance loss. The **4-bit AWQ** version of this model:
51
-
52
- - **Reduces memory footprint**, enabling deployment on lower-end hardware (e.g., consumer GPUs and edge devices).
53
- - **Speeds up inference**, making response times faster while maintaining accuracy.
54
- - **Preserves performance**, as AWQ selectively quantizes weights based on activation sensitivity, ensuring minimal loss in capability.
55
-
56
  ## Base Model Information
57
 
58
  Hermes 3 3B is a generalist language model fine-tuned from **Llama-3.2 3B**, with improvements in:
 
45
 
46
  For details on the original model, please see the [**Hermes 3 Technical Report**](https://arxiv.org/abs/2408.11857).
47
 
 
 
 
 
 
 
 
 
48
  ## Base Model Information
49
 
50
  Hermes 3 3B is a generalist language model fine-tuned from **Llama-3.2 3B**, with improvements in: