Update README.md
Browse files
README.md
CHANGED
@@ -45,14 +45,6 @@ This is a 4-bit AWQ (Activation-aware Weight Quantization) quantized version of
|
|
45 |
|
46 |
For details on the original model, please see the [**Hermes 3 Technical Report**](https://arxiv.org/abs/2408.11857).
|
47 |
|
48 |
-
### What is AWQ 4-bit Quantization?
|
49 |
-
|
50 |
-
AWQ (Activation-aware Weight Quantization) is a quantization technique designed to optimize large language models for efficient inference while minimizing performance loss. The **4-bit AWQ** version of this model:
|
51 |
-
|
52 |
-
- **Reduces memory footprint**, enabling deployment on lower-end hardware (e.g., consumer GPUs and edge devices).
|
53 |
-
- **Speeds up inference**, making response times faster while maintaining accuracy.
|
54 |
-
- **Preserves performance**, as AWQ selectively quantizes weights based on activation sensitivity, ensuring minimal loss in capability.
|
55 |
-
|
56 |
## Base Model Information
|
57 |
|
58 |
Hermes 3 3B is a generalist language model fine-tuned from **Llama-3.2 3B**, with improvements in:
|
|
|
45 |
|
46 |
For details on the original model, please see the [**Hermes 3 Technical Report**](https://arxiv.org/abs/2408.11857).
|
47 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
48 |
## Base Model Information
|
49 |
|
50 |
Hermes 3 3B is a generalist language model fine-tuned from **Llama-3.2 3B**, with improvements in:
|