pytorch
/

Qwen3-4B-8da4w

Text Generation

text-generation-inference

Model card Files Files and versions

metascroy commited on May 13

Commit

64df908

·

verified ·

1 Parent(s): a57930f

Update README.md

Files changed (1) hide show

README.md +3 -2

README.md CHANGED Viewed

@@ -26,9 +26,10 @@ We provide the [quantized pte](https://huggingface.co/pytorch/Qwen3-4B-8da4w/blo
 # Running in a mobile app
 The [pte file](https://huggingface.co/pytorch/Qwen3-4B-8da4w/blob/main/qwen3-4B-8da4w-1024-cxt.pte) can be run with ExecuTorch on a mobile phone.  See the [instructions](https://pytorch.org/executorch/main/llm/llama-demo-ios.html) for doing this in iOS.
-On iPhone 15 Pro, the model runs at [TODO: ADD] tokens/sec and uses [TODO: ADD] Mb of memory.
-[TODO: ADD SCREENSHOT]
 # Quantization Recipe

 # Running in a mobile app
 The [pte file](https://huggingface.co/pytorch/Qwen3-4B-8da4w/blob/main/qwen3-4B-8da4w-1024-cxt.pte) can be run with ExecuTorch on a mobile phone.  See the [instructions](https://pytorch.org/executorch/main/llm/llama-demo-ios.html) for doing this in iOS.
+On iPhone 15 Pro, the model runs at 14.8 tokens/sec and uses 3379 Mb of memory.
+![image/png](https://cdn-uploads.huggingface.co/production/uploads/66049fc71116cebd1d3bdcf4/eVHB7fVllmwVauKJvGu0d.png)
 # Quantization Recipe