Update README.md
Browse files
README.md
CHANGED
@@ -25,7 +25,7 @@ Each branch contains an individual bits per weight, with the main one containing
|
|
25 |
|
26 |
<a href="https://huggingface.co/machinez/zephyr-orpo-141b-A35b-v0.1-exl2/tree/2_75">2.75 bits per weight - Fits Quad Nvidia Tesla P100 16gb at 16k context</a>
|
27 |
|
28 |
-
## Sample instructions to load in TabbyAPI @ 1.5bpw on 3x Nvidia Tesla P100 16gb at 4k context
|
29 |
```JSON
|
30 |
{
|
31 |
"name": "Machinez_zephyr-orpo-141b-A35b-v0.1_1.5bpw",
|
@@ -52,7 +52,7 @@ Each branch contains an individual bits per weight, with the main one containing
|
|
52 |
}
|
53 |
```
|
54 |
|
55 |
-
## Sample instructions to load in TabbyAPI @ 2.75bpw on 4x Nvidia Tesla P100 16gb at 16k context
|
56 |
```JSON
|
57 |
{
|
58 |
"name": "Machinez_zephyr-orpo-141b-A35b-v0.1_2.75bpw",
|
|
|
25 |
|
26 |
<a href="https://huggingface.co/machinez/zephyr-orpo-141b-A35b-v0.1-exl2/tree/2_75">2.75 bits per weight - Fits Quad Nvidia Tesla P100 16gb at 16k context</a>
|
27 |
|
28 |
+
## Sample instructions to load in TabbyAPI @ 1.5bpw on 3x Nvidia Tesla P100 16gb at 4k context. ~14 tok/s
|
29 |
```JSON
|
30 |
{
|
31 |
"name": "Machinez_zephyr-orpo-141b-A35b-v0.1_1.5bpw",
|
|
|
52 |
}
|
53 |
```
|
54 |
|
55 |
+
## Sample instructions to load in TabbyAPI @ 2.75bpw on 4x Nvidia Tesla P100 16gb at 16k context. ~5.6 tok/s
|
56 |
```JSON
|
57 |
{
|
58 |
"name": "Machinez_zephyr-orpo-141b-A35b-v0.1_2.75bpw",
|