pszemraj
/

candle-flanUL2-quantized

Text2Text Generation

Model card Files Files and versions

pszemraj commited on Aug 26, 2024

Commit

b5b4be0

·

verified ·

1 Parent(s): 534089b

Update README.md

Files changed (1) hide show

README.md +13 -0

README.md CHANGED Viewed

@@ -23,6 +23,19 @@ cargo run --example quantized-t5 --release  -- \
 On my laptop (CPU, running in WSL) I get: `45 tokens generated (0.48 token/s)`
 ## setup

 On my laptop (CPU, running in WSL) I get: `45 tokens generated (0.48 token/s)`
+## weights
+Below are the weights/file names in this repo:
+| Weight File Name        | Quant Format | Size (GB) |
+|-------------------------|--------------|-----------|
+| flan-ul2-q2k.gguf       | q2k          | 6.39      |
+| flan-ul2-q3k.gguf       | q3k          | 8.36      |
+| flan-ul2-q4k.gguf       | q4k          | 10.9      |
+| flan-ul2-q6k.gguf       | q6k          | 16        |
+From initial testing, it appears that q2k is too low precision and produces poor/incoherent output. The `q3k` and higher are coherent.
 ## setup