pstan commited on
Commit
afc75b1
·
verified ·
1 Parent(s): 7e94e39

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -20,7 +20,7 @@ This repository contains optimized versions of the [gemma-2b-it](https://hugging
20
  ## ONNX Models
21
 
22
  Here are some of the optimized configurations we have added:
23
- - **ONNX model for int4 DML:** ONNX model for AMD, Intel, and NVIDIA GPUs on Windows, quantized to int4 using AWQ.
24
  - **ONNX model for int4 CPU and Mobile:** ONNX model for CPU and mobile using int4 quantization via RTN. There are two versions uploaded to balance latency vs. accuracy. Acc=1 is targeted at improved accuracy, while Acc=4 is for improved performance. For mobile devices, we recommend using the model with acc-level-4.
25
 
26
  ## Usage
 
20
  ## ONNX Models
21
 
22
  Here are some of the optimized configurations we have added:
23
+ - **ONNX model for int4 DirectML:** ONNX model for AMD, Intel, and NVIDIA GPUs on Windows, quantized to int4 using AWQ.
24
  - **ONNX model for int4 CPU and Mobile:** ONNX model for CPU and mobile using int4 quantization via RTN. There are two versions uploaded to balance latency vs. accuracy. Acc=1 is targeted at improved accuracy, while Acc=4 is for improved performance. For mobile devices, we recommend using the model with acc-level-4.
25
 
26
  ## Usage