add Throughput, New Discoveries
Browse files
README.md
CHANGED
@@ -8,7 +8,15 @@ Hardware requirements for ChatGPT GPT-4o level inference speed for the following
|
|
8 |
|
9 |
Note: The following results are based on my day-to-day workflows only. My goal was to run private models that could beat GPT-4o and Claude-3.5 in code debugging and generation to ‘load balance’ between OpenAI/Anthropic’s free plan and local models to avoid hitting rate limits, and to upload as few lines of my code and ideas to their servers as possible.
|
10 |
|
11 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
12 |
|
13 |
<br>
|
14 |
|
@@ -41,10 +49,12 @@ Think step by step. Solve this problem without removing any existing functionali
|
|
41 |
|
42 |
<br>
|
43 |
|
44 |
-
##
|
45 |
|
46 |
-
The following are
|
47 |
|
48 |
-
In general, if there's an error in the code, copy pasting the last few rows of stacktrace to the LLM seems to work.
|
|
|
|
|
49 |
|
50 |
-
|
|
|
8 |
|
9 |
Note: The following results are based on my day-to-day workflows only. My goal was to run private models that could beat GPT-4o and Claude-3.5 in code debugging and generation to ‘load balance’ between OpenAI/Anthropic’s free plan and local models to avoid hitting rate limits, and to upload as few lines of my code and ideas to their servers as possible.
|
10 |
|
11 |
+
An example of a complex debugging scenario is where you build library A on top of library B that requires library C as a dependency but the root cause was a variable in library C. In this case, the following workflow guided me to correctly identify the problem.
|
12 |
+
|
13 |
+
<br>
|
14 |
+
|
15 |
+
## Throughput
|
16 |
+
|
17 |
+

|
18 |
+
|
19 |
+
IQ in model names mean Imatrix Quantizations. For performance comparison against regular GGUF, please read [this Reddit post](https://www.reddit.com/r/LocalLLaMA/comments/1993iro/ggufs_quants_can_punch_above_their_weights_now/).
|
20 |
|
21 |
<br>
|
22 |
|
|
|
49 |
|
50 |
<br>
|
51 |
|
52 |
+
## New Discoveries
|
53 |
|
54 |
+
The following are tested, but may not generalize well to other workflows.
|
55 |
|
56 |
+
- In general, if there's an error in the code, copy pasting the last few rows of stacktrace to the LLM seems to work.
|
57 |
+
- Adding "Now, reflect." sometimes allows Claude-3.5-Sonnet to generate the correct solution.
|
58 |
+
- If GPT-4o reasons correctly in its first response and the conversation is then sent to GPT-4-mini, the mini model can maintain comparable level of reasoning/accuracy as GPT-4o.
|
59 |
|
60 |
+
<br>
|