Add download, generation mode
Browse files
README.md
CHANGED
|
@@ -16,7 +16,7 @@ An example of a complex debugging scenario is where you build library A on top o
|
|
| 16 |
|
| 17 |

|
| 18 |
|
| 19 |
-
IQ
|
| 20 |
|
| 21 |
<br>
|
| 22 |
|
|
@@ -34,10 +34,10 @@ Evaluated on two programming tasks: debugging and generation. It may be a bit su
|
|
| 34 |
| 6 | GPT-4o-mini | N/A | Decent, but struggles with complex debugging tasks | Reliable for shorter or simpler code generation tasks | Suitable for less complex coding tasks. |
|
| 35 |
| 7 | AutoCoder.IQ4_K.gguf | 26.43 | Average, offers different approaches but can be incorrect | Generates useful short code segments | Use Precise Mode for better results. |
|
| 36 |
| 8 | Meta-Llama-3.1-70B-Instruct-IQ2_XS.gguf | 2.55 | Poor, too slow to be practical in day-to-day workflows | Occasionally helps generate ideas | Speed is a significant limitation. |
|
| 37 |
-
| 9 | Trinity-2-Codestral-22B-Q6_K_L | N/A | Poor, similar issues to DeepSeekV2 in
|
| 38 |
| 10 | DeepSeekV2 Coder Lite Instruct Q_8L | N/A | Poor, repeats code similar to other models in its family | Not as effective in my context | Not recommended overall based on my criteria. |
|
| 39 |
|
| 40 |
-
|
| 41 |
```
|
| 42 |
<code>
|
| 43 |
<current output>
|
|
@@ -49,12 +49,63 @@ Think step by step. Solve this problem without removing any existing functionali
|
|
| 49 |
|
| 50 |
<br>
|
| 51 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 52 |
## New Discoveries
|
| 53 |
|
| 54 |
-
The following are tested, but may not generalize well to other workflows.
|
| 55 |
|
| 56 |
- In general, if there's an error in the code, copy pasting the last few rows of stacktrace to the LLM seems to work.
|
| 57 |
- Adding "Now, reflect." sometimes allows Claude-3.5-Sonnet to generate the correct solution.
|
| 58 |
- If GPT-4o reasons correctly in its first response and the conversation is then sent to GPT-4-mini, the mini model can maintain comparable level of reasoning/accuracy as GPT-4o.
|
| 59 |
|
| 60 |
-
<br>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 16 |
|
| 17 |

|
| 18 |
|
| 19 |
+
IQ here refers to Imatrix Quantization. For performance comparison against regular GGUF, please read [this Reddit post](https://www.reddit.com/r/LocalLLaMA/comments/1993iro/ggufs_quants_can_punch_above_their_weights_now/).
|
| 20 |
|
| 21 |
<br>
|
| 22 |
|
|
|
|
| 34 |
| 6 | GPT-4o-mini | N/A | Decent, but struggles with complex debugging tasks | Reliable for shorter or simpler code generation tasks | Suitable for less complex coding tasks. |
|
| 35 |
| 7 | AutoCoder.IQ4_K.gguf | 26.43 | Average, offers different approaches but can be incorrect | Generates useful short code segments | Use Precise Mode for better results. |
|
| 36 |
| 8 | Meta-Llama-3.1-70B-Instruct-IQ2_XS.gguf | 2.55 | Poor, too slow to be practical in day-to-day workflows | Occasionally helps generate ideas | Speed is a significant limitation. |
|
| 37 |
+
| 9 | Trinity-2-Codestral-22B-Q6_K_L | N/A | Poor, similar issues to DeepSeekV2 in outputing the same code | Decent, but often repeats code | Similar problem to DeepSeekV2, not recommended for my complex tasks. |
|
| 38 |
| 10 | DeepSeekV2 Coder Lite Instruct Q_8L | N/A | Poor, repeats code similar to other models in its family | Not as effective in my context | Not recommended overall based on my criteria. |
|
| 39 |
|
| 40 |
+
Code debugging prompt template used:
|
| 41 |
```
|
| 42 |
<code>
|
| 43 |
<current output>
|
|
|
|
| 49 |
|
| 50 |
<br>
|
| 51 |
|
| 52 |
+
## Generation Kwargs
|
| 53 |
+
|
| 54 |
+
Balanced Mode:
|
| 55 |
+
```python
|
| 56 |
+
generation_kwargs = {
|
| 57 |
+
"max_tokens":8192,
|
| 58 |
+
"stop":["<|EOT|>", "</s>", "<|end▁of▁sentence|>", "<eos>", "<|start_header_id|>", "<|end_header_id|>", "<|eot_id|>"],
|
| 59 |
+
"temperature":0.7,
|
| 60 |
+
"stream":True,
|
| 61 |
+
"top_k":50,
|
| 62 |
+
"top_p":0.95,
|
| 63 |
+
}
|
| 64 |
+
```
|
| 65 |
+
|
| 66 |
+
Precise Mode:
|
| 67 |
+
```python
|
| 68 |
+
generation_kwargs = {
|
| 69 |
+
"max_tokens":8192,
|
| 70 |
+
"stop":["<|EOT|>", "</s>", "<|end▁of▁sentence|>", "<eos>", "<|start_header_id|>", "<|end_header_id|>", "<|eot_id|>"],
|
| 71 |
+
"temperature":0.0,
|
| 72 |
+
"stream":True,
|
| 73 |
+
"top_p":1.0,
|
| 74 |
+
}
|
| 75 |
+
```
|
| 76 |
+
|
| 77 |
+
Qwen2 7B:
|
| 78 |
+
```python
|
| 79 |
+
generation_kwargs = {
|
| 80 |
+
"max_tokens":8192,
|
| 81 |
+
"stop":["<|EOT|>", "</s>", "<|end▁of▁sentence|>", "<eos>", "<|start_header_id|>", "<|end_header_id|>", "<|eot_id|>"],
|
| 82 |
+
"temperature":0.4,
|
| 83 |
+
"stream":True,
|
| 84 |
+
"top_k":20,
|
| 85 |
+
"top_p":0.8,
|
| 86 |
+
}
|
| 87 |
+
```
|
| 88 |
+
|
| 89 |
+
Other variations in temperature, top_k, and top_p were tested 5-8 times per model too, but I'm sticking to the above three.
|
| 90 |
+
|
| 91 |
+
<br>
|
| 92 |
+
|
| 93 |
## New Discoveries
|
| 94 |
|
| 95 |
+
The following are tested in my workflow, but may not generalize well to other workflows.
|
| 96 |
|
| 97 |
- In general, if there's an error in the code, copy pasting the last few rows of stacktrace to the LLM seems to work.
|
| 98 |
- Adding "Now, reflect." sometimes allows Claude-3.5-Sonnet to generate the correct solution.
|
| 99 |
- If GPT-4o reasons correctly in its first response and the conversation is then sent to GPT-4-mini, the mini model can maintain comparable level of reasoning/accuracy as GPT-4o.
|
| 100 |
|
| 101 |
+
<br>
|
| 102 |
+
|
| 103 |
+
## Download
|
| 104 |
+
|
| 105 |
+
```
|
| 106 |
+
pip install -U "huggingface_hub[cli]"
|
| 107 |
+
```
|
| 108 |
+
|
| 109 |
+
```
|
| 110 |
+
huggingface-cli download FredZhang7/claudegpt-code-debugger-v0.1 --include "codestral-22b-v0.1-IQ6_K.gguf" --local-dir ./
|
| 111 |
+
```
|