Phi-3 Mini-4K-Instruct ONNX model for onnxruntime-web
This is the same models as the official phi3 onnx model with a few changes to make it work for onnxruntime-web:
- the model is fp16 with int4 block quantization for weights
- the 'logits' output is fp32
- the model uses MHA instead of GQA
- onnx and external data file need to stay below 2GB to be cacheable in chromium
- Downloads last month
- 517
Inference Providers
NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API:
The HF Inference API does not support text-generation models for transformers.js library.