FredZhang7
/

claudegpt-code-logic-debugger-v0.1

@@ -28,6 +28,17 @@ IQ here refers to Imatrix Quantization. For performance comparison against regul
 Evaluated on two programming tasks: debugging and generation. It may be a bit subjective. `DeepSeekV2 Coder Instruct` is ranked lower because their privacy policy says that they may collect "text input, prompt" and there's no way around it.
 | **Rank** | **Model Name**                               | **Token Speed (tokens/s)** | **Debugging Performance**                                             | **Code Generation Performance**                                      | **Notes**                                                                                 |
 |----------|----------------------------------------------|----------------------------|------------------------------------------------------------------------|-----------------------------------------------------------------------|-------------------------------------------------------------------------------------------|
 | 1        | codestral-22b-v0.1-IQ6_K.gguf (this model)   | 34.21                       | Excellent at complex debugging, often surpasses GPT-4o and Claude-3.5  | Good, but may not be par with GPT-4o                                  | Best overall for debugging in my workflow, use Balanced Mode.                             |
@@ -41,15 +52,6 @@ Evaluated on two programming tasks: debugging and generation. It may be a bit su
 | 9        | Trinity-2-Codestral-22B-Q6_K_L               | N/A                         | Poor, similar issues to DeepSeekV2 in outputing the same code           | Decent, but often repeats code                                         | Similar problem to DeepSeekV2, not recommended for my complex tasks.                      |
 | 10       | DeepSeekV2 Coder Lite Instruct Q_8L          | N/A                         | Poor, repeats code similar to other models in its family                | Not as effective in my context                                         | Not recommended overall based on my criteria.                                             |
-Code debugging prompt template used:
-```
-<code>
-<current output>
-<the problem description of the current output>
-<expected output (in English is fine)>
-<any hints>
-Think step by step. Solve this problem without removing any existing functionalities, logic, or checks, except any incorrect code that interferes with your edits.
-```
 <br>

 Evaluated on two programming tasks: debugging and generation. It may be a bit subjective. `DeepSeekV2 Coder Instruct` is ranked lower because their privacy policy says that they may collect "text input, prompt" and there's no way around it.
+Code debugging prompt template used:
+```
+<code>
+<current output>
+<the problem description of the current output>
+<expected output (in English is fine)>
+<any hints>
+Think step by step. Solve this problem without removing any existing functionalities, logic, or checks, except any incorrect code that interferes with your edits.
+```
 | **Rank** | **Model Name**                               | **Token Speed (tokens/s)** | **Debugging Performance**                                             | **Code Generation Performance**                                      | **Notes**                                                                                 |
 |----------|----------------------------------------------|----------------------------|------------------------------------------------------------------------|-----------------------------------------------------------------------|-------------------------------------------------------------------------------------------|
 | 1        | codestral-22b-v0.1-IQ6_K.gguf (this model)   | 34.21                       | Excellent at complex debugging, often surpasses GPT-4o and Claude-3.5  | Good, but may not be par with GPT-4o                                  | Best overall for debugging in my workflow, use Balanced Mode.                             |
 | 9        | Trinity-2-Codestral-22B-Q6_K_L               | N/A                         | Poor, similar issues to DeepSeekV2 in outputing the same code           | Decent, but often repeats code                                         | Similar problem to DeepSeekV2, not recommended for my complex tasks.                      |
 | 10       | DeepSeekV2 Coder Lite Instruct Q_8L          | N/A                         | Poor, repeats code similar to other models in its family                | Not as effective in my context                                         | Not recommended overall based on my criteria.                                             |
 <br>