Spaces:
Running
on
Zero
Running
on
Zero
Update app.py
Browse files
app.py
CHANGED
@@ -136,24 +136,19 @@ def run_inference(message, history, model_picked, context_size, max_output):
|
|
136 |
print(result)
|
137 |
return result
|
138 |
|
139 |
-
description="""
|
140 |
-
A demo chat interface with Pixtral 12B EXL2 Quants, deployed using **ExllamaV2**!
|
141 |
-
|
142 |
The model will be loaded once the GPU is available. This space specifically will load by default Pixtral at 4bpw from the following repository: [turboderp/pixtral-12b-exl2](https://huggingface.co/turboderp/pixtral-12b-exl2). Other quantization options are available.
|
143 |
-
|
144 |
The current version of ExllamaV2 running is the dev branch, not the master branch: [ExllamaV2](https://github.com/turboderp/exllamav2/tree/dev).
|
145 |
|
146 |
-
The model at **4bpw and 16k context size fits in less than 12GB of VRAM**!
|
147 |
|
148 |
The current default settings are:
|
149 |
- Model Quant: 4.0bpw
|
150 |
- Context Size: 16k tokens
|
151 |
- Max Output: 512 tokens
|
152 |
-
|
153 |
You can select other quants and experiment!
|
154 |
|
155 |
-
Thanks, turboderp!
|
156 |
-
"""
|
157 |
examples = [
|
158 |
[
|
159 |
{"text": "What are the similarities and differences between these two experiments?", "files":["test_image_1.jpg", "test_image_2.jpg"]},
|
|
|
136 |
print(result)
|
137 |
return result
|
138 |
|
139 |
+
description="""A demo chat interface with Pixtral 12B EXL2 Quants, deployed using **ExllamaV2**!
|
|
|
|
|
140 |
The model will be loaded once the GPU is available. This space specifically will load by default Pixtral at 4bpw from the following repository: [turboderp/pixtral-12b-exl2](https://huggingface.co/turboderp/pixtral-12b-exl2). Other quantization options are available.
|
|
|
141 |
The current version of ExllamaV2 running is the dev branch, not the master branch: [ExllamaV2](https://github.com/turboderp/exllamav2/tree/dev).
|
142 |
|
143 |
+
The model at **4bpw and 16k context size fits in less than 12GB of VRAM**, and at **2.5bpw and short context can potentially fit in 8GB of VRAM**!
|
144 |
|
145 |
The current default settings are:
|
146 |
- Model Quant: 4.0bpw
|
147 |
- Context Size: 16k tokens
|
148 |
- Max Output: 512 tokens
|
|
|
149 |
You can select other quants and experiment!
|
150 |
|
151 |
+
Thanks, turboderp!"""
|
|
|
152 |
examples = [
|
153 |
[
|
154 |
{"text": "What are the similarities and differences between these two experiments?", "files":["test_image_1.jpg", "test_image_2.jpg"]},
|