amd
/

Llama-2-7b-chat-hf-awq-g128-int4-asym-fp16-onnx-hybrid

Text Generation

Model card Files Files and versions Community

satreysa commited on 12 days ago

Commit

adce840

·

verified ·

1 Parent(s): 0f6105f

Update README.md

Files changed (1) hide show

README.md +4 -29

README.md CHANGED Viewed

@@ -16,39 +16,14 @@ tags:
 # meta-llama/Llama-2-7b-chat-hf
 - ## Introduction
-  - Quantization Tool: Quark 0.6.0
-  - OGA Model Builder: v0.5.1
-  - Postprocess
 - ## Quantization Strategy
   - AWQ / Group 128 / Asymmetric / UINT4 Weights / FP16 activations
   - Excluded Layers: None
-  ```
-  python3 quantize_quark.py \
-        --model_dir "$model" \
-        --output_dir "$output_dir" \
-        --quant_scheme w_uint4_per_group_asym \
-        --num_calib_data 128 \
-        --quant_algo awq \
-        --dataset pileval_for_awq_benchmark \
-        --seq_len 512 \
-        --model_export quark_safetensors \
-        --data_type float16 \
-        --exclude_layers [] \
-        --custom_mode awq
-  ```
-- ## OGA Model Builder
-  ```
-  python builder.py \
-    -i <quantized safetensor model dir> \
-    -o <oga model output dir> \
-    -p int4 \
-    -e dml
-  ```
-- PostProcessed to generate Hybrid Model
--
 - ## Quick Start
-For quickstart, refer to hybrid-llm-artifacts_1.3.0.zip available in [RyzenAI-SW-EA](https://account.amd.com/en/member/ryzenai-sw-ea.html)
 #### Evaluation scores
 The perplexity measurement is run on the wikitext-2-raw-v1 (raw data) dataset provided by Hugging Face. Perplexity score measured for prompt length 2k is 7.1518.

 # meta-llama/Llama-2-7b-chat-hf
 - ## Introduction
+  This model was prepared using the AMD Quark Quantization tool, followed by necessary post-processing.
 - ## Quantization Strategy
   - AWQ / Group 128 / Asymmetric / UINT4 Weights / FP16 activations
   - Excluded Layers: None
 - ## Quick Start
+For quickstart, refer to [Ryzen AI doucmentation](https://ryzenai.docs.amd.com/en/latest/hybrid_oga.html)
 #### Evaluation scores
 The perplexity measurement is run on the wikitext-2-raw-v1 (raw data) dataset provided by Hugging Face. Perplexity score measured for prompt length 2k is 7.1518.