add readme
Browse files
README.md
CHANGED
|
@@ -1,3 +1,50 @@
|
|
| 1 |
-
---
|
| 2 |
-
license: apache-2.0
|
| 3 |
-
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: apache-2.0
|
| 3 |
+
---
|
| 4 |
+
|
| 5 |
+
# Code Debugger v0.1
|
| 6 |
+
|
| 7 |
+
Hardware requirements for ChatGPT GPT-4o level inference speed for the following models on an RTX 3090: >=24 GB VRAM.
|
| 8 |
+
|
| 9 |
+
Note: The following results are based on my day-to-day workflows only. My goal was to run private models that could beat GPT-4o and Claude-3.5 in code debugging and generation to ‘load balance’ between OpenAI/Anthropic’s free plan and local models to avoid hitting rate limits, and to upload as few lines of my code and ideas to their servers as possible.
|
| 10 |
+
|
| 11 |
+
By a complex debugging task, I mean scenarios where you build library A on top of library B that requires library C as a dependency but the root cause was a variable in library C. In this case, the following workflow guided me to correctly identify the problem.
|
| 12 |
+
|
| 13 |
+
<br>
|
| 14 |
+
|
| 15 |
+
## Personal Preference Ranking
|
| 16 |
+
|
| 17 |
+
Evaluated on two programming tasks: debugging and generation. It may be a bit subjective. `DeepSeekV2 Coder Instruct` is ranked lower because their privacy policy says that they may collect "text input, prompt" and there's no way around it.
|
| 18 |
+
|
| 19 |
+
| **Rank** | **Model Name** | **Token Speed (tokens/s)** | **Debugging Performance** | **Code Generation Performance** | **Notes** |
|
| 20 |
+
|----------|----------------------------------------------|----------------------------|------------------------------------------------------------------------|-----------------------------------------------------------------------|-------------------------------------------------------------------------------------------|
|
| 21 |
+
| 1 | codestral-22b-v0.1-IQ6_K.gguf (this model) | 34.21 | Excellent at complex debugging, often surpasses GPT-4o and Claude-3.5 | Good, but may not be par with GPT-4o | Best overall for debugging in my workflow, use Balanced Mode. |
|
| 22 |
+
| 2 | Claude-3.5-Sonnet | N/A | Poor in complex debugging compared to Codestral | Excellent, better than GPT-4o in long code generation | Great for code generation, but weaker in debugging. |
|
| 23 |
+
| 3 | GPT-4o | N/A | Good at complex debugging but can be outperformed by Codestral | Excellent, generally reliable for code generation | Balanced performance between code debugging and generation. |
|
| 24 |
+
| 4 | DeepSeekV2 Coder Instruct | N/A | Poor, outputs the same code in complex scenarios | Great at general code generation, rivals GPT-4o | Excellent at code generation, but has data privacy concerns as per Privacy Policy. |
|
| 25 |
+
| 5 | qwen2 7b instruct bf16 | 78.22 | Average, can think of correct approaches | Sometimes helps generate new ideas | High speed, useful for generating ideas. |
|
| 26 |
+
| 6 | GPT-4o-mini | N/A | Decent, but struggles with complex debugging tasks | Reliable for shorter or simpler code generation tasks | Suitable for less complex coding tasks. |
|
| 27 |
+
| 7 | AutoCoder.IQ4_K.gguf | 26.43 | Average, offers different approaches but can be incorrect | Generates useful short code segments | Use Precise Mode for better results. |
|
| 28 |
+
| 8 | Meta-Llama-3.1-70B-Instruct-IQ2_XS.gguf | 2.55 | Poor, too slow to be practical in day-to-day workflows | Occasionally helps generate ideas | Speed is a significant limitation. |
|
| 29 |
+
| 9 | Trinity-2-Codestral-22B-Q6_K_L | N/A | Poor, similar issues to DeepSeekV2 in debugging | Decent, but often repeats code | Similar problem to DeepSeekV2, not recommended for my complex tasks. |
|
| 30 |
+
| 10 | DeepSeekV2 Coder Lite Instruct Q_8L | N/A | Poor, repeats code similar to other models in its family | Not as effective in my context | Not recommended overall based on my criteria. |
|
| 31 |
+
|
| 32 |
+
Prompt format:
|
| 33 |
+
```
|
| 34 |
+
<code>
|
| 35 |
+
<current output>
|
| 36 |
+
<the problem description of the current output>
|
| 37 |
+
<expected output (in English is fine)>
|
| 38 |
+
<any hints>
|
| 39 |
+
Think step by step. Solve this problem without removing any existing functionalities, logic, or checks, except any incorrect code that interferes with your edits.
|
| 40 |
+
```
|
| 41 |
+
|
| 42 |
+
<br>
|
| 43 |
+
|
| 44 |
+
## Debugging with Reflection
|
| 45 |
+
|
| 46 |
+
The following are personal opinions.
|
| 47 |
+
|
| 48 |
+
In general, if there's an error in the code, copy pasting the last few rows of stacktrace to the LLM seems to work.
|
| 49 |
+
|
| 50 |
+
Adding "Now, reflect." sometimes allows Claude-3.5-Sonnet to generate the correct solution.
|