Update README.md
Browse files
README.md
CHANGED
@@ -35,9 +35,10 @@ Models are released as sharded safetensors files.
|
|
35 |
|
36 |
| Bits | GS | GPTQ Dataset | Max Seq Len | Size | VRAM |
|
37 |
| ---- | -- | ----------- | ------- | ---- | ---- |
|
38 |
-
| 4 | 128 | [wikitext2-v1](Salesforce/wikitext) | 131,072 | 2.
|
|
|
39 |
* Depends on maximum sequence length parameter (KV cache utilization) used with vLLM or Transformers
|
40 |
-
|
41 |
<!-- README_GPTQ.md-provided-files end -->
|
42 |
|
43 |
## Original Model Card Below
|
|
|
35 |
|
36 |
| Bits | GS | GPTQ Dataset | Max Seq Len | Size | VRAM |
|
37 |
| ---- | -- | ----------- | ------- | ---- | ---- |
|
38 |
+
| 4 | 128 | [wikitext2-v1](Salesforce/wikitext) | 131,072 | 2.28 Gb | 22-32 Gb*
|
39 |
+
|
40 |
* Depends on maximum sequence length parameter (KV cache utilization) used with vLLM or Transformers
|
41 |
+
|
42 |
<!-- README_GPTQ.md-provided-files end -->
|
43 |
|
44 |
## Original Model Card Below
|