Spaces:

pandora-s
/

Pixtral-12B-EXL2

Runtime error

pandora-s commited on Nov 11, 2024

Commit

4624aaa

verified ·

1 Parent(s): 6559e7b

Update app.py

Files changed (1) hide show

app.py CHANGED Viewed

@@ -136,24 +136,19 @@ def run_inference(message, history, model_picked, context_size, max_output):
     print(result)
     return result
-description="""
-A demo chat interface with Pixtral 12B EXL2 Quants, deployed using **ExllamaV2**!
 The model will be loaded once the GPU is available. This space specifically will load by default Pixtral at 4bpw from the following repository: [turboderp/pixtral-12b-exl2](https://huggingface.co/turboderp/pixtral-12b-exl2). Other quantization options are available.
 The current version of ExllamaV2 running is the dev branch, not the master branch: [ExllamaV2](https://github.com/turboderp/exllamav2/tree/dev).
-The model at **4bpw and 16k context size fits in less than 12GB of VRAM**!
 The current default settings are:
 - Model Quant: 4.0bpw
 - Context Size: 16k tokens
 - Max Output: 512 tokens
 You can select other quants and experiment!
-Thanks, turboderp!
-"""
 examples = [
     [
         {"text": "What are the similarities and differences between these two experiments?", "files":["test_image_1.jpg", "test_image_2.jpg"]},

     print(result)
     return result
+description="""A demo chat interface with Pixtral 12B EXL2 Quants, deployed using **ExllamaV2**!
 The model will be loaded once the GPU is available. This space specifically will load by default Pixtral at 4bpw from the following repository: [turboderp/pixtral-12b-exl2](https://huggingface.co/turboderp/pixtral-12b-exl2). Other quantization options are available.
 The current version of ExllamaV2 running is the dev branch, not the master branch: [ExllamaV2](https://github.com/turboderp/exllamav2/tree/dev).
+The model at **4bpw and 16k context size fits in less than 12GB of VRAM**, and at **2.5bpw and short context can potentially fit in 8GB of VRAM**!
 The current default settings are:
 - Model Quant: 4.0bpw
 - Context Size: 16k tokens
 - Max Output: 512 tokens
 You can select other quants and experiment!
+Thanks, turboderp!"""
 examples = [
     [
         {"text": "What are the similarities and differences between these two experiments?", "files":["test_image_1.jpg", "test_image_2.jpg"]},