neuralmagic
/

Qwen2-VL-72B-Instruct-FP8-dynamic

Image-Text-to-Text

text-generation-inference

compressed-tensors

Model card Files Files and versions Community

shubhrapandit commited on Feb 25

Commit

ac009f1

·

verified ·

1 Parent(s): db3d325

Update README.md

Files changed (1) hide show

README.md +3 -3

README.md CHANGED Viewed

@@ -19,8 +19,8 @@ library_name: transformers
   - **Input:** Vision-Text
   - **Output:** Text
 - **Model Optimizations:**
-  - **Weight quantization:** INT4
-  - **Activation quantization:** FP16
 - **Release Date:** 2/24/2025
 - **Version:** 1.0
 - **Model Developers:** Neural Magic
@@ -29,7 +29,7 @@ Quantized version of [Qwen/Qwen2-VL-72B-Instruct](https://huggingface.co/Qwen/Qw
 ### Model Optimizations
-This model was obtained by quantizing the weights of [Qwen/Qwen2-VL-72B-Instruct](https://huggingface.co/Qwen/Qwen2-VL-72B-Instruct) to INT8 data type, ready for inference with vLLM >= 0.5.2.
 ## Deployment

   - **Input:** Vision-Text
   - **Output:** Text
 - **Model Optimizations:**
+  - **Weight quantization:** FP8
+  - **Activation quantization:** FP8
 - **Release Date:** 2/24/2025
 - **Version:** 1.0
 - **Model Developers:** Neural Magic
 ### Model Optimizations
+This model was obtained by quantizing the weights of [Qwen/Qwen2-VL-72B-Instruct](https://huggingface.co/Qwen/Qwen2-VL-72B-Instruct) to FP8 data type, ready for inference with vLLM >= 0.5.2.
 ## Deployment