Update README.md
Browse files
README.md
CHANGED
@@ -20,7 +20,7 @@ This repository contains optimized versions of the [gemma-2b-it](https://hugging
|
|
20 |
## ONNX Models
|
21 |
|
22 |
Here are some of the optimized configurations we have added:
|
23 |
-
- **ONNX model for int4
|
24 |
- **ONNX model for int4 CPU and Mobile:** ONNX model for CPU and mobile using int4 quantization via RTN. There are two versions uploaded to balance latency vs. accuracy. Acc=1 is targeted at improved accuracy, while Acc=4 is for improved performance. For mobile devices, we recommend using the model with acc-level-4.
|
25 |
|
26 |
## Usage
|
|
|
20 |
## ONNX Models
|
21 |
|
22 |
Here are some of the optimized configurations we have added:
|
23 |
+
- **ONNX model for int4 DirectML:** ONNX model for AMD, Intel, and NVIDIA GPUs on Windows, quantized to int4 using AWQ.
|
24 |
- **ONNX model for int4 CPU and Mobile:** ONNX model for CPU and mobile using int4 quantization via RTN. There are two versions uploaded to balance latency vs. accuracy. Acc=1 is targeted at improved accuracy, while Acc=4 is for improved performance. For mobile devices, we recommend using the model with acc-level-4.
|
25 |
|
26 |
## Usage
|