Gemma 2 2B quantized for wllama (under 2gb).
q4_0_4_8 is WAY faster when using llama.cpp, with wllama, it's about the same as q4_k.
- Downloads last month
- 49
Inference Providers
NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API:
The model has no library tag.