Huanzhi Mao
commited on
Commit
·
8a12377
1
Parent(s):
64cee2d
update data.csv
Browse files
data.csv
CHANGED
@@ -1,29 +1,29 @@
|
|
1 |
Rank,Overall Acc,Model,Model Link,Organization,License,AST Summary,Exec Summary,Simple Function AST,Python Simple Function AST,Java Simple Function AST,JavaScript Simple Function AST,Multiple Functions AST,Parallel Functions AST,Parallel Multiple AST,Simple Function Exec,Python Simple Function Exec,REST Simple Function Exec,Multiple Functions Exec,Parallel Functions Exec,Parallel Multiple Exec,Relevance Detection,Cost ($ Per 1k Function Calls),Latency Mean (s),Latency Standard Deviation (s),,
|
2 |
-
1,79.35%,Claude-3-Opus-20240229 (Prompt),https://www.anthropic.com/news/claude-3-family,Anthropic,Proprietary,79.02%,68.13%,83.09%,89.25%,64.00%,72.00%,90.50%,80.00%,62.50%,83.53%,84.00%,82.86%,74.00%,70.00%,45.00%,80.83%,10.81,5.67,1.43
|
3 |
-
2,78.71%,GPT-4-0125-Preview (Prompt),https://platform.openai.com/docs/models/gpt-4-and-gpt-4-turbo,OpenAI,Proprietary,81.16%,67.36%,83.64%,89.00%,65.00%,78.00%,90.00%,86.50%,64.50%,82.94%,83.00%,82.86%,76.00%,68.00%,42.50%,69.17%,5.22,2.1,1.4
|
4 |
-
3,78.35%,Gorilla-OpenFunctions-v2 (FC),https://gorilla.cs.berkeley.edu/blogs/7_open_functions_v2.html,Gorilla LLM,Apache 2.0,81.94%,69.64%,83.27%,93.00%,53.00%,66.00%,93.00%,85.50%,66.00%,87.06%,83.00%,92.86%,76.00%,68.00%,47.50%,60.83%,1.53,2.38,2.05
|
5 |
-
4,78.18%,GPT-4-0125-Preview (FC),https://platform.openai.com/docs/models/gpt-4-and-gpt-4-turbo,OpenAI,Proprietary,81.23%,63.88%,78.91%,88.75%,55.00%,48.00%,91.00%,88.50%,66.50%,70.00%,82.00%,52.86%,68.00%,70.00%,47.50%,81.67%,4.76,4.37,5.31
|
6 |
-
5,77.71%,GPT-4-1106-Preview (FC),https://platform.openai.com/docs/models/gpt-4-and-gpt-4-turbo,OpenAI,Proprietary,78.86%,67.10%,77.45%,87.25%,48.00%,58.00%,88.00%,88.00%,62.00%,79.41%,81.00%,77.14%,74.00%,70.00%,45.00%,80.83%,4.95,6.45,6.45
|
7 |
-
6,73.82%,Claude-3-Sonnet-20240229 (Prompt),https://www.anthropic.com/news/claude-3-family,Anthropic,Proprietary,78.86%,66.07%,79.45%,88.25%,52.00%,64.00%,85.50%,85.00%,65.50%,75.29%,74.00%,77.14%,74.00%,70.00%,45.00%,53.33%,2.13,2.38,0.66
|
8 |
-
7,72.24%,Mistral-Medium-2312 (Prompt),https://docs.mistral.ai/guides/model-selection/,Mistral AI,Proprietary,70.31%,56.79%,74.73%,82.75%,54.00%,52.00%,73.50%,79.50%,53.50%,67.65%,77.00%,54.29%,68.00%,64.00%,27.50%,88.33%,1.75,3.41,4.65
|
9 |
-
8,70.71%,Functionary-Small (FC),https://huggingface.co/meetkai/functionary-small-v2.2,MeetKai,MIT,72.72%,60.75%,72.36%,83.25%,41.00%,48.00%,86.50%,76.50%,55.50%,60.00%,81.00%,30.00%,72.00%,66.00%,45.00%,74.17%,1.94,3.02,3.2
|
10 |
-
9,69.18%,Claude-instant-1.2 (Prompt),https://www.anthropic.com/news/releasing-claude-instant-1-2,Anthropic,Proprietary,71.17%,61.37%,76.18%,83.75%,53.00%,62.00%,84.00%,78.50%,46.00%,76.47%,81.00%,70.00%,66.00%,58.00%,45.00%,54.17%,0.95,1.75,1
|
11 |
-
10,68.24%,Claude-3-Opus-20240229 (FC),https://www.anthropic.com/news/claude-3-family,Anthropic,Proprietary,69.17%,61.85%,70.18%,81.25%,38.00%,46.00%,74.50%,75.00%,57.00%,75.88%,81.00%,68.57%,66.00%,68.00%,37.50%,62.50%,26.58,14.82,6
|
12 |
-
11,66.00%,Mistral-Small-2402 (Prompt),https://docs.mistral.ai/guides/model-selection/,Mistral AI,Proprietary,73.30%,29.07%,76.18%,82.75%,57.00%,62.00%,83.50%,81.00%,52.50%,25.29%,1.00%,60.00%,6.00%,60.00%,25.00%,76.25%,1.04,1.62,1.75
|
13 |
-
12,64.06%,Claude-3-Sonnet-20240229 (FC),https://www.anthropic.com/news/claude-3-family,Anthropic,Proprietary,66.52%,59.26%,69.09%,78.50%,38.00%,56.00%,70.50%,72.50%,54.00%,63.53%,83.00%,35.71%,68.00%,68.00%,37.50%,51.67%,5.08,6.7,3.07
|
14 |
-
13,63.65%,Claude-2.1 (Prompt),https://www.anthropic.com/news/claude-2-1,Anthropic,Proprietary,59.47%,45.68%,74.36%,80.00%,59.00%,60.00%,72.50%,55.00%,36.00%,48.24%,61.00%,30.00%,66.00%,36.00%,32.50%,83.33%,6.62,3.2,1.42
|
15 |
-
14,62.53%,Mistral-large-2402 (FC Auto),https://docs.mistral.ai/guides/model-selection/,Mistral AI,Proprietary,57.60%,47.33%,64.91%,88.00%,3.00%,4.00%,88.50%,22.50%,54.50%,68.82%,75.00%,60.00%,72.00%,6.00%,42.50%,84.17%,4.74,2.91,2.73
|
16 |
-
15,62.00%,Claude-3-Haiku-20240307 (FC),https://www.anthropic.com/news/claude-3-family,Anthropic,Proprietary,71.22%,57.65%,74.36%,84.25%,49.00%,46.00%,76.00%,76.00%,58.50%,50.59%,82.00%,5.71%,72.00%,68.00%,40.00%,21.67%,0.44,3.81,2.32
|
17 |
-
16,60.65%,DBRX-Instruct (Prompt),https://www.databricks.com/blog/introducing-dbrx-new-state-art-open-llm,Databricks,Databricks Open Model,62.20%,59.25%,61.82%,72.75%,26.00%,46.00%,73.00%,70.00%,44.00%,60.00%,77.00%,35.71%,72.00%,60.00%,45.00%,54.58%,1.24,0.62,0.41
|
18 |
-
17,57.53%,Mistral-large-2402 (FC Any),https://docs.mistral.ai/guides/model-selection/,Mistral AI,Proprietary,63.89%,50.25%,78.55%,86.75%,56.00%,58.00%,89.50%,32.50%,55.00%,80.00%,83.00%,75.71%,74.00%,2.00%,45.00%,0.00%,3.86,1.99,2.53
|
19 |
-
18,56.65%,GPT-3.5-Turbo-0125 (FC),https://platform.openai.com/docs/models/gpt-3-5-turbo,OpenAI,Proprietary,66.31%,66.38%,56.73%,57.75%,51.00%,60.00%,65.50%,86.00%,57.00%,83.53%,78.00%,91.43%,70.00%,72.00%,40.00%,2.08%,0.42,1.53,1.25
|
20 |
-
19,55.82%,Gemini-1.0-Pro (FC),https://deepmind.google/technologies/gemini/#introduction,Google,Proprietary,40.92%,38.68%,76.18%,87.50%,43.00%,52.00%,87.50%,0.00%,0.00%,78.24%,83.00%,71.43%,74.00%,0.00%,2.50%,76.67%,0.18,1.05,0.23
|
21 |
-
20,54.82%,Mistral-small-2402 (FC Any),https://docs.mistral.ai/guides/model-selection/,Mistral AI,Proprietary,58.92%,46.38%,78.18%,86.00%,56.00%,60.00%,90.00%,36.50%,31.00%,80.00%,84.00%,74.29%,76.00%,12.00%,17.50%,0.00%,0.95,1.14,1.15
|
22 |
-
21,50.18%,Nexusflow-Raven-v2 (FC),https://huggingface.co/Nexusflow/NexusRaven-V2-13B,Nexusflow,Apache 2.0,55.36%,56.95%,65.45%,70.50%,49.00%,58.00%,81.00%,35.00%,40.00%,55.29%,78.00%,22.86%,72.00%,58.00%,42.50%,2.08%,1.22,1.9,1.57
|
23 |
-
22,49.88%,FireFunction-v1 (FC),https://huggingface.co/fireworks-ai/firefunction-v1,Fireworks,Apache 2.0,37.56%,32.19%,66.73%,85.75%,13.00%,22.00%,83.50%,0.00%,0.00%,61.76%,73.00%,45.71%,62.00%,0.00%,5.00%,73.33%,0.59,0.92,0.8
|
24 |
-
23,47.94%,GPT-4-0613 (FC),https://platform.openai.com/docs/models/gpt-4-and-gpt-4-turbo,OpenAI,Proprietary,36.57%,25.10%,59.27%,80.50%,3.00%,2.00%,87.00%,0.00%,0.00%,39.41%,55.00%,17.14%,56.00%,0.00%,5.00%,90.83%,10.27,3.64,3.99
|
25 |
-
24,46.35%,Mistral-tiny-2312 (Prompt),https://docs.mistral.ai/guides/model-selection/,Mistral AI,Proprietary,41.98%,27.32%,48.91%,60.25%,28.00%,0.00%,48.50%,44.50%,26.00%,21.76%,36.00%,1.43%,28.00%,42.00%,17.50%,84.17%,0.13,1.14,1.19
|
26 |
-
25,39.29%,Gemma-7b-it (Prompt),https://blog.google/technology/developers/gemma-open-models/,Google,gemma-terms-of-use,38.10%,27.93%,40.91%,46.25%,28.00%,24.00%,49.00%,29.50%,33.00%,24.71%,38.00%,5.71%,32.00%,30.00%,25.00%,57.08%,0.03,0.09,N/A
|
27 |
-
26,37.12%,Deepseek-v1.5 (Prompt),https://huggingface.co/deepseek-ai/deepseek-coder-7b-instruct-v1.5,Deepseek,Deepseek License,33.51%,30.05%,36.55%,45.25%,7.00%,26.00%,41.00%,33.00%,23.50%,34.71%,39.00%,28.57%,38.00%,30.00%,17.50%,56.25%,0.45,1.2,N/A
|
28 |
-
27,29.76%,Glaive-v1 (FC),https://huggingface.co/glaiveai/glaive-function-calling-v1,Glaive,cc-by-sa-4.0,28.84%,0.00%,52.36%,70.25%,3.00%,8.00%,63.00%,0.00%,0.00%,0.00%,0.00%,0.00%,0.00%,0.00%,0.00%,38.33%,0.02,0.06,N/A
|
29 |
-
28,17.82%,Mistral-small-2402 (FC Auto),https://docs.mistral.ai/guides/model-selection/,Mistral AI,Proprietary,2.52%,7.65%,1.09%,1.50%,0.00%,0.00%,3.00%,4.00%,2.00%,20.59%,18.00%,24.29%,10.00%,0.00%,0.00%,99.58%,1.99,2.82,1.43
|
|
|
1 |
Rank,Overall Acc,Model,Model Link,Organization,License,AST Summary,Exec Summary,Simple Function AST,Python Simple Function AST,Java Simple Function AST,JavaScript Simple Function AST,Multiple Functions AST,Parallel Functions AST,Parallel Multiple AST,Simple Function Exec,Python Simple Function Exec,REST Simple Function Exec,Multiple Functions Exec,Parallel Functions Exec,Parallel Multiple Exec,Relevance Detection,Cost ($ Per 1k Function Calls),Latency Mean (s),Latency Standard Deviation (s),,
|
2 |
+
1,79.35%,Claude-3-Opus-20240229 (Prompt),https://www.anthropic.com/news/claude-3-family,Anthropic,Proprietary,79.02%,68.13%,83.09%,89.25%,64.00%,72.00%,90.50%,80.00%,62.50%,83.53%,84.00%,82.86%,74.00%,70.00%,45.00%,80.83%,10.81,5.67,1.43
|
3 |
+
2,78.71%,GPT-4-0125-Preview (Prompt),https://platform.openai.com/docs/models/gpt-4-and-gpt-4-turbo,OpenAI,Proprietary,81.16%,67.36%,83.64%,89.00%,65.00%,78.00%,90.00%,86.50%,64.50%,82.94%,83.00%,82.86%,76.00%,68.00%,42.50%,69.17%,5.22,2.1,1.4
|
4 |
+
3,78.35%,Gorilla-OpenFunctions-v2 (FC),https://gorilla.cs.berkeley.edu/blogs/7_open_functions_v2.html,Gorilla LLM,Apache 2.0,81.94%,69.64%,83.27%,93.00%,53.00%,66.00%,93.00%,85.50%,66.00%,87.06%,83.00%,92.86%,76.00%,68.00%,47.50%,60.83%,1.53,2.38,2.05
|
5 |
+
4,78.18%,GPT-4-0125-Preview (FC),https://platform.openai.com/docs/models/gpt-4-and-gpt-4-turbo,OpenAI,Proprietary,81.23%,63.88%,78.91%,88.75%,55.00%,48.00%,91.00%,88.50%,66.50%,70.00%,82.00%,52.86%,68.00%,70.00%,47.50%,81.67%,4.76,4.37,5.31
|
6 |
+
5,77.71%,GPT-4-1106-Preview (FC),https://platform.openai.com/docs/models/gpt-4-and-gpt-4-turbo,OpenAI,Proprietary,78.86%,67.10%,77.45%,87.25%,48.00%,58.00%,88.00%,88.00%,62.00%,79.41%,81.00%,77.14%,74.00%,70.00%,45.00%,80.83%,4.95,6.45,6.45
|
7 |
+
6,73.82%,Claude-3-Sonnet-20240229 (Prompt),https://www.anthropic.com/news/claude-3-family,Anthropic,Proprietary,78.86%,66.07%,79.45%,88.25%,52.00%,64.00%,85.50%,85.00%,65.50%,75.29%,74.00%,77.14%,74.00%,70.00%,45.00%,53.33%,2.13,2.38,0.66
|
8 |
+
7,72.24%,Mistral-Medium-2312 (Prompt),https://docs.mistral.ai/guides/model-selection/,Mistral AI,Proprietary,70.31%,56.79%,74.73%,82.75%,54.00%,52.00%,73.50%,79.50%,53.50%,67.65%,77.00%,54.29%,68.00%,64.00%,27.50%,88.33%,1.75,3.41,4.65
|
9 |
+
8,70.71%,Functionary-Small (FC),https://huggingface.co/meetkai/functionary-small-v2.2,MeetKai,MIT,72.72%,60.75%,72.36%,83.25%,41.00%,48.00%,86.50%,76.50%,55.50%,60.00%,81.00%,30.00%,72.00%,66.00%,45.00%,74.17%,1.94,3.02,3.2
|
10 |
+
9,69.18%,Claude-instant-1.2 (Prompt),https://www.anthropic.com/news/releasing-claude-instant-1-2,Anthropic,Proprietary,71.17%,61.37%,76.18%,83.75%,53.00%,62.00%,84.00%,78.50%,46.00%,76.47%,81.00%,70.00%,66.00%,58.00%,45.00%,54.17%,0.95,1.75,1
|
11 |
+
10,68.24%,Claude-3-Opus-20240229 (FC),https://www.anthropic.com/news/claude-3-family,Anthropic,Proprietary,69.17%,61.85%,70.18%,81.25%,38.00%,46.00%,74.50%,75.00%,57.00%,75.88%,81.00%,68.57%,66.00%,68.00%,37.50%,62.50%,26.58,14.82,6
|
12 |
+
11,66.00%,Mistral-Small-2402 (Prompt),https://docs.mistral.ai/guides/model-selection/,Mistral AI,Proprietary,73.30%,29.07%,76.18%,82.75%,57.00%,62.00%,83.50%,81.00%,52.50%,25.29%,1.00%,60.00%,6.00%,60.00%,25.00%,76.25%,1.04,1.62,1.75
|
13 |
+
12,64.06%,Claude-3-Sonnet-20240229 (FC),https://www.anthropic.com/news/claude-3-family,Anthropic,Proprietary,66.52%,59.26%,69.09%,78.50%,38.00%,56.00%,70.50%,72.50%,54.00%,63.53%,83.00%,35.71%,68.00%,68.00%,37.50%,51.67%,5.08,6.7,3.07
|
14 |
+
13,63.65%,Claude-2.1 (Prompt),https://www.anthropic.com/news/claude-2-1,Anthropic,Proprietary,59.47%,45.68%,74.36%,80.00%,59.00%,60.00%,72.50%,55.00%,36.00%,48.24%,61.00%,30.00%,66.00%,36.00%,32.50%,83.33%,6.62,3.2,1.42
|
15 |
+
14,62.53%,Mistral-large-2402 (FC Auto),https://docs.mistral.ai/guides/model-selection/,Mistral AI,Proprietary,57.60%,47.33%,64.91%,88.00%,3.00%,4.00%,88.50%,22.50%,54.50%,68.82%,75.00%,60.00%,72.00%,6.00%,42.50%,84.17%,4.74,2.91,2.73
|
16 |
+
15,62.00%,Claude-3-Haiku-20240307 (FC),https://www.anthropic.com/news/claude-3-family,Anthropic,Proprietary,71.22%,57.65%,74.36%,84.25%,49.00%,46.00%,76.00%,76.00%,58.50%,50.59%,82.00%,5.71%,72.00%,68.00%,40.00%,21.67%,0.44,3.81,2.32
|
17 |
+
16,60.65%,DBRX-Instruct (Prompt),https://www.databricks.com/blog/introducing-dbrx-new-state-art-open-llm,Databricks,Databricks Open Model,62.20%,59.25%,61.82%,72.75%,26.00%,46.00%,73.00%,70.00%,44.00%,60.00%,77.00%,35.71%,72.00%,60.00%,45.00%,54.58%,1.24,0.62,0.41
|
18 |
+
17,57.53%,Mistral-large-2402 (FC Any),https://docs.mistral.ai/guides/model-selection/,Mistral AI,Proprietary,63.89%,50.25%,78.55%,86.75%,56.00%,58.00%,89.50%,32.50%,55.00%,80.00%,83.00%,75.71%,74.00%,2.00%,45.00%,0.00%,3.86,1.99,2.53
|
19 |
+
18,56.65%,GPT-3.5-Turbo-0125 (FC),https://platform.openai.com/docs/models/gpt-3-5-turbo,OpenAI,Proprietary,66.31%,66.38%,56.73%,57.75%,51.00%,60.00%,65.50%,86.00%,57.00%,83.53%,78.00%,91.43%,70.00%,72.00%,40.00%,2.08%,0.42,1.53,1.25
|
20 |
+
19,55.82%,Gemini-1.0-Pro (FC),https://deepmind.google/technologies/gemini/#introduction,Google,Proprietary,40.92%,38.68%,76.18%,87.50%,43.00%,52.00%,87.50%,0.00%,0.00%,78.24%,83.00%,71.43%,74.00%,0.00%,2.50%,76.67%,0.18,1.05,0.23
|
21 |
+
20,54.82%,Mistral-small-2402 (FC Any),https://docs.mistral.ai/guides/model-selection/,Mistral AI,Proprietary,58.92%,46.38%,78.18%,86.00%,56.00%,60.00%,90.00%,36.50%,31.00%,80.00%,84.00%,74.29%,76.00%,12.00%,17.50%,0.00%,0.95,1.14,1.15
|
22 |
+
21,50.18%,Nexusflow-Raven-v2 (FC),https://huggingface.co/Nexusflow/NexusRaven-V2-13B,Nexusflow,Apache 2.0,55.36%,56.95%,65.45%,70.50%,49.00%,58.00%,81.00%,35.00%,40.00%,55.29%,78.00%,22.86%,72.00%,58.00%,42.50%,2.08%,1.22,1.9,1.57
|
23 |
+
22,49.88%,FireFunction-v1 (FC),https://huggingface.co/fireworks-ai/firefunction-v1,Fireworks,Apache 2.0,37.56%,32.19%,66.73%,85.75%,13.00%,22.00%,83.50%,0.00%,0.00%,61.76%,73.00%,45.71%,62.00%,0.00%,5.00%,73.33%,0.59,0.92,0.8
|
24 |
+
23,47.94%,GPT-4-0613 (FC),https://platform.openai.com/docs/models/gpt-4-and-gpt-4-turbo,OpenAI,Proprietary,36.57%,25.10%,59.27%,80.50%,3.00%,2.00%,87.00%,0.00%,0.00%,39.41%,55.00%,17.14%,56.00%,0.00%,5.00%,90.83%,10.27,3.64,3.99
|
25 |
+
24,46.35%,Mistral-tiny-2312 (Prompt),https://docs.mistral.ai/guides/model-selection/,Mistral AI,Proprietary,41.98%,27.32%,48.91%,60.25%,28.00%,0.00%,48.50%,44.50%,26.00%,21.76%,36.00%,1.43%,28.00%,42.00%,17.50%,84.17%,0.13,1.14,1.19
|
26 |
+
25,39.29%,Gemma-7b-it (Prompt),https://blog.google/technology/developers/gemma-open-models/,Google,gemma-terms-of-use,38.10%,27.93%,40.91%,46.25%,28.00%,24.00%,49.00%,29.50%,33.00%,24.71%,38.00%,5.71%,32.00%,30.00%,25.00%,57.08%,0.03,0.09,N/A
|
27 |
+
26,37.12%,Deepseek-v1.5 (Prompt),https://huggingface.co/deepseek-ai/deepseek-coder-7b-instruct-v1.5,Deepseek,Deepseek License,33.51%,30.05%,36.55%,45.25%,7.00%,26.00%,41.00%,33.00%,23.50%,34.71%,39.00%,28.57%,38.00%,30.00%,17.50%,56.25%,0.45,1.2,N/A
|
28 |
+
27,29.76%,Glaive-v1 (FC),https://huggingface.co/glaiveai/glaive-function-calling-v1,Glaive,cc-by-sa-4.0,28.84%,0.00%,52.36%,70.25%,3.00%,8.00%,63.00%,0.00%,0.00%,0.00%,0.00%,0.00%,0.00%,0.00%,0.00%,38.33%,0.02,0.06,N/A
|
29 |
+
28,17.82%,Mistral-small-2402 (FC Auto),https://docs.mistral.ai/guides/model-selection/,Mistral AI,Proprietary,2.52%,7.65%,1.09%,1.50%,0.00%,0.00%,3.00%,4.00%,2.00%,20.59%,18.00%,24.29%,10.00%,0.00%,0.00%,99.58%,1.99,2.82,1.43
|