change order
Browse files
README.md
CHANGED
@@ -28,6 +28,17 @@ IQ here refers to Imatrix Quantization. For performance comparison against regul
|
|
28 |
|
29 |
Evaluated on two programming tasks: debugging and generation. It may be a bit subjective. `DeepSeekV2 Coder Instruct` is ranked lower because their privacy policy says that they may collect "text input, prompt" and there's no way around it.
|
30 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
31 |
| **Rank** | **Model Name** | **Token Speed (tokens/s)** | **Debugging Performance** | **Code Generation Performance** | **Notes** |
|
32 |
|----------|----------------------------------------------|----------------------------|------------------------------------------------------------------------|-----------------------------------------------------------------------|-------------------------------------------------------------------------------------------|
|
33 |
| 1 | codestral-22b-v0.1-IQ6_K.gguf (this model) | 34.21 | Excellent at complex debugging, often surpasses GPT-4o and Claude-3.5 | Good, but may not be par with GPT-4o | Best overall for debugging in my workflow, use Balanced Mode. |
|
@@ -41,15 +52,6 @@ Evaluated on two programming tasks: debugging and generation. It may be a bit su
|
|
41 |
| 9 | Trinity-2-Codestral-22B-Q6_K_L | N/A | Poor, similar issues to DeepSeekV2 in outputing the same code | Decent, but often repeats code | Similar problem to DeepSeekV2, not recommended for my complex tasks. |
|
42 |
| 10 | DeepSeekV2 Coder Lite Instruct Q_8L | N/A | Poor, repeats code similar to other models in its family | Not as effective in my context | Not recommended overall based on my criteria. |
|
43 |
|
44 |
-
Code debugging prompt template used:
|
45 |
-
```
|
46 |
-
<code>
|
47 |
-
<current output>
|
48 |
-
<the problem description of the current output>
|
49 |
-
<expected output (in English is fine)>
|
50 |
-
<any hints>
|
51 |
-
Think step by step. Solve this problem without removing any existing functionalities, logic, or checks, except any incorrect code that interferes with your edits.
|
52 |
-
```
|
53 |
|
54 |
<br>
|
55 |
|
|
|
28 |
|
29 |
Evaluated on two programming tasks: debugging and generation. It may be a bit subjective. `DeepSeekV2 Coder Instruct` is ranked lower because their privacy policy says that they may collect "text input, prompt" and there's no way around it.
|
30 |
|
31 |
+
|
32 |
+
Code debugging prompt template used:
|
33 |
+
```
|
34 |
+
<code>
|
35 |
+
<current output>
|
36 |
+
<the problem description of the current output>
|
37 |
+
<expected output (in English is fine)>
|
38 |
+
<any hints>
|
39 |
+
Think step by step. Solve this problem without removing any existing functionalities, logic, or checks, except any incorrect code that interferes with your edits.
|
40 |
+
```
|
41 |
+
|
42 |
| **Rank** | **Model Name** | **Token Speed (tokens/s)** | **Debugging Performance** | **Code Generation Performance** | **Notes** |
|
43 |
|----------|----------------------------------------------|----------------------------|------------------------------------------------------------------------|-----------------------------------------------------------------------|-------------------------------------------------------------------------------------------|
|
44 |
| 1 | codestral-22b-v0.1-IQ6_K.gguf (this model) | 34.21 | Excellent at complex debugging, often surpasses GPT-4o and Claude-3.5 | Good, but may not be par with GPT-4o | Best overall for debugging in my workflow, use Balanced Mode. |
|
|
|
52 |
| 9 | Trinity-2-Codestral-22B-Q6_K_L | N/A | Poor, similar issues to DeepSeekV2 in outputing the same code | Decent, but often repeats code | Similar problem to DeepSeekV2, not recommended for my complex tasks. |
|
53 |
| 10 | DeepSeekV2 Coder Lite Instruct Q_8L | N/A | Poor, repeats code similar to other models in its family | Not as effective in my context | Not recommended overall based on my criteria. |
|
54 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
55 |
|
56 |
<br>
|
57 |
|