Create README.md
Browse files
README.md
ADDED
@@ -0,0 +1,13 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
EXL3 quants of [Phi-4-mini-instruct](https://huggingface.co/microsoft/Phi-4-mini-instruct)
|
2 |
+
|
3 |
+
At the moment these are all converted with 8-bpw output layers. Currently investigating why there's a small-but-noticeable drop in accuracy at 6-bpw. Likely it has to do with the tied embeddings.
|
4 |
+
|
5 |
+
[2.00 bits per weight / H8](https://huggingface.co/turboderp/Phi-4-mini-instruct-exl3/tree/2.0bpw)
|
6 |
+
[2.25 bits per weight / H8](https://huggingface.co/turboderp/Phi-4-mini-instruct-exl3/tree/2.25bpw)
|
7 |
+
[2.50 bits per weight / H8](https://huggingface.co/turboderp/Phi-4-mini-instruct-exl3/tree/2.5bpw)
|
8 |
+
[3.00 bits per weight / H8](https://huggingface.co/turboderp/Phi-4-mini-instruct-exl3/tree/3.0bpw)
|
9 |
+
[3.50 bits per weight / H8](https://huggingface.co/turboderp/Phi-4-mini-instruct-exl3/tree/3.5bpw)
|
10 |
+
[4.00 bits per weight / H8](https://huggingface.co/turboderp/Phi-4-mini-instruct-exl3/tree/4.0bpw)
|
11 |
+
[5.00 bits per weight / H8](https://huggingface.co/turboderp/Phi-4-mini-instruct-exl3/tree/5.0bpw)
|
12 |
+
[6.00 bits per weight / H8](https://huggingface.co/turboderp/Phi-4-mini-instruct-exl3/tree/6.0bpw)
|
13 |
+
[8.00 bits per weight / H8](https://huggingface.co/turboderp/Phi-4-mini-instruct-exl3/tree/8.0bpw)
|