phi-3.5-onnx-qnn
phi-3.5-onnx-qnn is an ONNX QNN int4 quantized version of Microsoft Phi-3.5-mini-instruct, providing a small fast NPU inference implementation, optimized for NPU deployment on Windows ARM64 AI PCs with Snapdragon Elite X NPU processors.
Model Description
- Developed by: microsoft
- Model type: phi3
- Parameters: 3.8 billion
- Model Parent: microsoft/Phi-3.5-mini-instruct
- Language(s) (NLP): English
- License: Apache 2.0
- Uses: Chat, general-purpose LLM
- Quantization: int4
Model Card Contact
- Downloads last month
- 10
Inference Providers
NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API:
The model authors have turned it off explicitly.
Model tree for llmware/phi-3.5-onnx-qnn
Base model
microsoft/Phi-3.5-mini-instruct