add readme

ffe845b verified 12 months ago

5.61 kB

	---
	license: apache-2.0
	---

	# Code Debugger v0.1

	Hardware requirements for ChatGPT GPT-4o level inference speed for the following models on an RTX 3090: >=24 GB VRAM.

	Note: The following results are based on my day-to-day workflows only. My goal was to run private models that could beat GPT-4o and Claude-3.5 in code debugging and generation to ‘load balance’ between OpenAI/Anthropic’s free plan and local models to avoid hitting rate limits, and to upload as few lines of my code and ideas to their servers as possible.

	By a complex debugging task, I mean scenarios where you build library A on top of library B that requires library C as a dependency but the root cause was a variable in library C. In this case, the following workflow guided me to correctly identify the problem.

	<br>

	## Personal Preference Ranking

	Evaluated on two programming tasks: debugging and generation. It may be a bit subjective. `DeepSeekV2 Coder Instruct` is ranked lower because their privacy policy says that they may collect "text input, prompt" and there's no way around it.

	\| Rank \| Model Name \| Token Speed (tokens/s) \| Debugging Performance \| Code Generation Performance \| Notes \|
	\|----------\|----------------------------------------------\|----------------------------\|------------------------------------------------------------------------\|-----------------------------------------------------------------------\|-------------------------------------------------------------------------------------------\|
	\| 1 \| codestral-22b-v0.1-IQ6_K.gguf (this model) \| 34.21 \| Excellent at complex debugging, often surpasses GPT-4o and Claude-3.5 \| Good, but may not be par with GPT-4o \| Best overall for debugging in my workflow, use Balanced Mode. \|
	\| 2 \| Claude-3.5-Sonnet \| N/A \| Poor in complex debugging compared to Codestral \| Excellent, better than GPT-4o in long code generation \| Great for code generation, but weaker in debugging. \|
	\| 3 \| GPT-4o \| N/A \| Good at complex debugging but can be outperformed by Codestral \| Excellent, generally reliable for code generation \| Balanced performance between code debugging and generation. \|
	\| 4 \| DeepSeekV2 Coder Instruct \| N/A \| Poor, outputs the same code in complex scenarios \| Great at general code generation, rivals GPT-4o \| Excellent at code generation, but has data privacy concerns as per Privacy Policy. \|
	\| 5 \| qwen2 7b instruct bf16 \| 78.22 \| Average, can think of correct approaches \| Sometimes helps generate new ideas \| High speed, useful for generating ideas. \|
	\| 6 \| GPT-4o-mini \| N/A \| Decent, but struggles with complex debugging tasks \| Reliable for shorter or simpler code generation tasks \| Suitable for less complex coding tasks. \|
	\| 7 \| AutoCoder.IQ4_K.gguf \| 26.43 \| Average, offers different approaches but can be incorrect \| Generates useful short code segments \| Use Precise Mode for better results. \|
	\| 8 \| Meta-Llama-3.1-70B-Instruct-IQ2_XS.gguf \| 2.55 \| Poor, too slow to be practical in day-to-day workflows \| Occasionally helps generate ideas \| Speed is a significant limitation. \|
	\| 9 \| Trinity-2-Codestral-22B-Q6_K_L \| N/A \| Poor, similar issues to DeepSeekV2 in debugging \| Decent, but often repeats code \| Similar problem to DeepSeekV2, not recommended for my complex tasks. \|
	\| 10 \| DeepSeekV2 Coder Lite Instruct Q_8L \| N/A \| Poor, repeats code similar to other models in its family \| Not as effective in my context \| Not recommended overall based on my criteria. \|

	Prompt format:
	```
	<code>
	<current output>
	<the problem description of the current output>
	<expected output (in English is fine)>
	<any hints>
	Think step by step. Solve this problem without removing any existing functionalities, logic, or checks, except any incorrect code that interferes with your edits.
	```

	<br>

	## Debugging with Reflection

	The following are personal opinions.

	In general, if there's an error in the code, copy pasting the last few rows of stacktrace to the LLM seems to work.

	Adding "Now, reflect." sometimes allows Claude-3.5-Sonnet to generate the correct solution.