Update README.md
Browse files
README.md
CHANGED
@@ -23,6 +23,19 @@ cargo run --example quantized-t5 --release -- \
|
|
23 |
|
24 |
On my laptop (CPU, running in WSL) I get: `45 tokens generated (0.48 token/s)`
|
25 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
26 |
|
27 |
## setup
|
28 |
|
|
|
23 |
|
24 |
On my laptop (CPU, running in WSL) I get: `45 tokens generated (0.48 token/s)`
|
25 |
|
26 |
+
## weights
|
27 |
+
|
28 |
+
|
29 |
+
Below are the weights/file names in this repo:
|
30 |
+
|
31 |
+
| Weight File Name | Quant Format | Size (GB) |
|
32 |
+
|-------------------------|--------------|-----------|
|
33 |
+
| flan-ul2-q2k.gguf | q2k | 6.39 |
|
34 |
+
| flan-ul2-q3k.gguf | q3k | 8.36 |
|
35 |
+
| flan-ul2-q4k.gguf | q4k | 10.9 |
|
36 |
+
| flan-ul2-q6k.gguf | q6k | 16 |
|
37 |
+
|
38 |
+
From initial testing, it appears that q2k is too low precision and produces poor/incoherent output. The `q3k` and higher are coherent.
|
39 |
|
40 |
## setup
|
41 |
|