FredZhang7 commited on
Commit
e96d022
·
verified ·
1 Parent(s): ffe845b

add Throughput, New Discoveries

Browse files
Files changed (1) hide show
  1. README.md +15 -5
README.md CHANGED
@@ -8,7 +8,15 @@ Hardware requirements for ChatGPT GPT-4o level inference speed for the following
8
 
9
  Note: The following results are based on my day-to-day workflows only. My goal was to run private models that could beat GPT-4o and Claude-3.5 in code debugging and generation to ‘load balance’ between OpenAI/Anthropic’s free plan and local models to avoid hitting rate limits, and to upload as few lines of my code and ideas to their servers as possible.
10
 
11
- By a complex debugging task, I mean scenarios where you build library A on top of library B that requires library C as a dependency but the root cause was a variable in library C. In this case, the following workflow guided me to correctly identify the problem.
 
 
 
 
 
 
 
 
12
 
13
  <br>
14
 
@@ -41,10 +49,12 @@ Think step by step. Solve this problem without removing any existing functionali
41
 
42
  <br>
43
 
44
- ## Debugging with Reflection
45
 
46
- The following are personal opinions.
47
 
48
- In general, if there's an error in the code, copy pasting the last few rows of stacktrace to the LLM seems to work.
 
 
49
 
50
- Adding "Now, reflect." sometimes allows Claude-3.5-Sonnet to generate the correct solution.
 
8
 
9
  Note: The following results are based on my day-to-day workflows only. My goal was to run private models that could beat GPT-4o and Claude-3.5 in code debugging and generation to ‘load balance’ between OpenAI/Anthropic’s free plan and local models to avoid hitting rate limits, and to upload as few lines of my code and ideas to their servers as possible.
10
 
11
+ An example of a complex debugging scenario is where you build library A on top of library B that requires library C as a dependency but the root cause was a variable in library C. In this case, the following workflow guided me to correctly identify the problem.
12
+
13
+ <br>
14
+
15
+ ## Throughput
16
+
17
+ ![](./model_v0.1_throughput_comparison.png)
18
+
19
+ IQ in model names mean Imatrix Quantizations. For performance comparison against regular GGUF, please read [this Reddit post](https://www.reddit.com/r/LocalLLaMA/comments/1993iro/ggufs_quants_can_punch_above_their_weights_now/).
20
 
21
  <br>
22
 
 
49
 
50
  <br>
51
 
52
+ ## New Discoveries
53
 
54
+ The following are tested, but may not generalize well to other workflows.
55
 
56
+ - In general, if there's an error in the code, copy pasting the last few rows of stacktrace to the LLM seems to work.
57
+ - Adding "Now, reflect." sometimes allows Claude-3.5-Sonnet to generate the correct solution.
58
+ - If GPT-4o reasons correctly in its first response and the conversation is then sent to GPT-4-mini, the mini model can maintain comparable level of reasoning/accuracy as GPT-4o.
59
 
60
+ <br>