pandora-s commited on
Commit
4624aaa
·
verified ·
1 Parent(s): 6559e7b

Update app.py

Browse files
Files changed (1) hide show
  1. app.py +3 -8
app.py CHANGED
@@ -136,24 +136,19 @@ def run_inference(message, history, model_picked, context_size, max_output):
136
  print(result)
137
  return result
138
 
139
- description="""
140
- A demo chat interface with Pixtral 12B EXL2 Quants, deployed using **ExllamaV2**!
141
-
142
  The model will be loaded once the GPU is available. This space specifically will load by default Pixtral at 4bpw from the following repository: [turboderp/pixtral-12b-exl2](https://huggingface.co/turboderp/pixtral-12b-exl2). Other quantization options are available.
143
-
144
  The current version of ExllamaV2 running is the dev branch, not the master branch: [ExllamaV2](https://github.com/turboderp/exllamav2/tree/dev).
145
 
146
- The model at **4bpw and 16k context size fits in less than 12GB of VRAM**!
147
 
148
  The current default settings are:
149
  - Model Quant: 4.0bpw
150
  - Context Size: 16k tokens
151
  - Max Output: 512 tokens
152
-
153
  You can select other quants and experiment!
154
 
155
- Thanks, turboderp!
156
- """
157
  examples = [
158
  [
159
  {"text": "What are the similarities and differences between these two experiments?", "files":["test_image_1.jpg", "test_image_2.jpg"]},
 
136
  print(result)
137
  return result
138
 
139
+ description="""A demo chat interface with Pixtral 12B EXL2 Quants, deployed using **ExllamaV2**!
 
 
140
  The model will be loaded once the GPU is available. This space specifically will load by default Pixtral at 4bpw from the following repository: [turboderp/pixtral-12b-exl2](https://huggingface.co/turboderp/pixtral-12b-exl2). Other quantization options are available.
 
141
  The current version of ExllamaV2 running is the dev branch, not the master branch: [ExllamaV2](https://github.com/turboderp/exllamav2/tree/dev).
142
 
143
+ The model at **4bpw and 16k context size fits in less than 12GB of VRAM**, and at **2.5bpw and short context can potentially fit in 8GB of VRAM**!
144
 
145
  The current default settings are:
146
  - Model Quant: 4.0bpw
147
  - Context Size: 16k tokens
148
  - Max Output: 512 tokens
 
149
  You can select other quants and experiment!
150
 
151
+ Thanks, turboderp!"""
 
152
  examples = [
153
  [
154
  {"text": "What are the similarities and differences between these two experiments?", "files":["test_image_1.jpg", "test_image_2.jpg"]},