BFCL May 14th Release
Browse files
data.csv
CHANGED
@@ -1,40 +1,45 @@
|
|
1 |
Rank,Overall Acc,Model,Model Link,Organization,License,AST Summary,Exec Summary,Simple Function AST,Python Simple Function AST,Java Simple Function AST,JavaScript Simple Function AST,Multiple Functions AST,Parallel Functions AST,Parallel Multiple AST,Simple Function Exec,Python Simple Function Exec,REST Simple Function Exec,Multiple Functions Exec,Parallel Functions Exec,Parallel Multiple Exec,Relevance Detection,Cost ($ Per 1k Function Calls),Latency Mean (s),Latency Standard Deviation (s),Latency 95th Percentile (s)
|
2 |
-
1,
|
3 |
-
2,86.
|
4 |
-
3,
|
5 |
-
4,
|
6 |
-
5,
|
7 |
-
6,
|
8 |
-
7,
|
9 |
-
8,
|
10 |
-
9,
|
11 |
-
10,
|
12 |
-
11,
|
13 |
-
12,
|
14 |
-
13,
|
15 |
-
14,
|
16 |
-
15,
|
17 |
-
16,
|
18 |
-
17,
|
19 |
-
18,
|
20 |
-
19,
|
21 |
-
20,
|
22 |
-
21,
|
23 |
-
22,
|
24 |
-
23,
|
25 |
-
24,
|
26 |
-
25,
|
27 |
-
26,59
|
28 |
-
27,
|
29 |
-
28,
|
30 |
-
29,
|
31 |
-
30,
|
32 |
-
31,
|
33 |
-
32,
|
34 |
-
33,
|
35 |
-
34,
|
36 |
-
35,
|
37 |
-
36,
|
38 |
-
37,
|
39 |
-
38,
|
40 |
-
39,
|
|
|
|
|
|
|
|
|
|
|
|
1 |
Rank,Overall Acc,Model,Model Link,Organization,License,AST Summary,Exec Summary,Simple Function AST,Python Simple Function AST,Java Simple Function AST,JavaScript Simple Function AST,Multiple Functions AST,Parallel Functions AST,Parallel Multiple AST,Simple Function Exec,Python Simple Function Exec,REST Simple Function Exec,Multiple Functions Exec,Parallel Functions Exec,Parallel Multiple Exec,Relevance Detection,Cost ($ Per 1k Function Calls),Latency Mean (s),Latency Standard Deviation (s),Latency 95th Percentile (s)
|
2 |
+
1,87.00%,GPT-4-0125-Preview (Prompt),https://platform.openai.com/docs/models/gpt-4-and-gpt-4-turbo,OpenAI,Proprietary,89.09%,88.10%,88.36%,94.75%,67.00%,80.00%,95.00%,90.50%,82.50%,99.41%,100.00%,98.57%,94.00%,84.00%,75.00%,70.42%,5.25,1.97,1.31,4.43
|
3 |
+
2,86.47%,Claude-3-Opus-20240229 (Prompt),https://www.anthropic.com/news/claude-3-family,Anthropic,Proprietary,86.43%,86.16%,86.73%,93.75%,66.00%,72.00%,94.00%,86.00%,79.00%,97.65%,98.00%,97.14%,92.00%,80.00%,75.00%,80.42%,10.84,4.61,1.53,7.32
|
4 |
+
3,84.53%,Gemini-1.5-Pro-Preview-0514 (FC),https://deepmind.google/technologies/gemini/pro/,Google,Proprietary,84.20%,83.32%,79.82%,91.50%,50.00%,46.00%,91.50%,90.50%,75.00%,91.76%,96.00%,85.71%,88.00%,76.00%,77.50%,89.58%,0.86,1.94,0.79,3.46
|
5 |
+
4,84.35%,GPT-4-1106-Preview (FC),https://platform.openai.com/docs/models/gpt-4-and-gpt-4-turbo,OpenAI,Proprietary,85.45%,81.73%,83.82%,92.75%,61.00%,58.00%,91.50%,91.50%,75.00%,89.41%,95.00%,81.43%,92.00%,78.00%,67.50%,80.42%,5.07,5.97,6.31,18.14
|
6 |
+
5,84.24%,GPT-4-turbo-2024-04-09 (Prompt),https://platform.openai.com/docs/models/gpt-4-and-gpt-4-turbo,OpenAI,Proprietary,87.22%,86.04%,86.36%,93.75%,60.00%,80.00%,95.00%,90.00%,77.50%,97.65%,97.00%,98.57%,94.00%,80.00%,72.50%,62.50%,5.25,2.57,2.2,5.72
|
7 |
+
6,84.12%,Gemini-1.5-Pro-Preview-0409 (FC),https://deepmind.google/technologies/gemini/#introduction,Google,Proprietary,84.03%,82.88%,79.64%,91.25%,53.00%,40.00%,92.00%,89.50%,75.00%,90.00%,96.00%,81.43%,90.00%,74.00%,77.50%,88.75%,0.86,1.95,0.89,3.53
|
8 |
+
7,83.29%,Gorilla-OpenFunctions-v2 (FC),https://gorilla.cs.berkeley.edu/blogs/7_open_functions_v2.html,Gorilla LLM,Apache 2.0,86.45%,81.55%,87.82%,94.50%,67.00%,76.00%,95.00%,87.50%,75.50%,94.71%,94.00%,95.71%,94.00%,70.00%,67.50%,61.25%,0.31,0.05,N/A,N/A
|
9 |
+
8,83.12%,GPT-4-0125-Preview (FC),https://platform.openai.com/docs/models/gpt-4-and-gpt-4-turbo,OpenAI,Proprietary,84.00%,84.76%,80.00%,90.25%,53.00%,52.00%,93.00%,90.00%,73.00%,83.53%,98.00%,62.86%,92.00%,86.00%,77.50%,82.92%,4.83,4.72,5.76,18.62
|
10 |
+
9,82.12%,Meta-Llama-3-70B-Instruct (Prompt),https://llama.meta.com/llama3,Meta,Meta Llama 3 Community,84.24%,85.90%,81.45%,92.25%,49.00%,60.00%,92.00%,91.00%,72.50%,94.12%,97.00%,90.00%,90.00%,82.00%,77.50%,66.67%,1.1,0.18,N/A,N/A
|
11 |
+
10,81.35%,GPT-4o-2024-05-13 (FC),https://openai.com/index/hello-gpt-4o/,OpenAI,Proprietary,82.01%,80.37%,78.55%,87.75%,56.00%,50.00%,90.00%,87.50%,72.00%,86.47%,94.00%,75.71%,78.00%,82.00%,75.00%,81.25%,2.33,2.09,2.52,6.93
|
12 |
+
11,81.24%,GPT-4-turbo-2024-04-09 (FC),https://platform.openai.com/docs/models/gpt-4-and-gpt-4-turbo,OpenAI,Proprietary,82.14%,78.61%,74.55%,91.00%,33.00%,26.00%,90.00%,89.00%,75.00%,82.94%,93.00%,68.57%,88.00%,76.00%,67.50%,88.75%,4.78,5.48,6.37,18.51
|
13 |
+
12,81.06%,Claude-3-Sonnet-20240229 (Prompt),https://www.anthropic.com/news/claude-3-family,Anthropic,Proprietary,85.77%,86.76%,83.09%,92.00%,53.00%,72.00%,89.00%,88.00%,83.00%,93.53%,96.00%,90.00%,92.00%,84.00%,77.50%,51.25%,2.13,1.95,1.16,3.08
|
14 |
+
13,80.24%,Mistral-Medium-2312 (Prompt),https://docs.mistral.ai/guides/model-selection/,Mistral AI,Proprietary,81.47%,73.47%,80.36%,90.00%,54.00%,56.00%,92.00%,84.00%,69.50%,65.88%,96.00%,22.86%,76.00%,82.00%,70.00%,88.33%,1.76,2.75,2.15,6.38
|
15 |
+
14,79.53%,GPT-4o-2024-05-13 (Prompt),https://openai.com/index/hello-gpt-4o/,OpenAI,Proprietary,75.02%,77.62%,85.09%,90.75%,68.00%,74.00%,84.00%,78.50%,52.50%,90.00%,95.00%,82.86%,78.00%,70.00%,72.50%,82.50%,2.67,1.15,0.78,2.67
|
16 |
+
15,79.06%,Command-R-Plus (Prompt) (Optimized),https://txt.cohere.com/command-r-plus-microsoft-azure,Cohere For AI,cc-by-nc-4.0,80.93%,86.74%,82.73%,89.50%,64.00%,66.00%,88.50%,81.00%,71.50%,92.94%,97.00%,87.14%,90.00%,84.00%,80.00%,54.17%,1.9,1.27,0.93,3.24
|
17 |
+
16,79.00%,Command-R-Plus (Prompt) (Original),https://txt.cohere.com/command-r-plus-microsoft-azure,Cohere For AI,cc-by-nc-4.0,81.22%,86.24%,82.36%,89.50%,63.00%,64.00%,90.00%,80.00%,72.50%,92.94%,98.00%,85.71%,88.00%,84.00%,80.00%,53.75%,1.9,1.32,0.94,3.25
|
18 |
+
17,79.00%,Functionary-Medium-v2.4 (FC),https://huggingface.co/meetkai/functionary-medium-v2.4,MeetKai,MIT,82.57%,75.71%,79.27%,87.75%,55.00%,60.00%,90.50%,87.50%,73.00%,68.82%,85.00%,45.71%,84.00%,80.00%,70.00%,74.17%,N/A,2.49,2.69,7.45
|
19 |
+
18,78.88%,Gemini-1.5-Flash-Preview-0514 (FC),https://deepmind.google/technologies/gemini/flash/,Google,Proprietary,78.51%,74.57%,80.55%,90.50%,58.00%,46.00%,93.00%,77.50%,63.00%,81.76%,94.00%,64.29%,90.00%,54.00%,72.50%,79.58%,0.07,1.0,0.49,1.54
|
20 |
+
19,78.47%,Functionary-Small-v2.4 (FC),https://huggingface.co/meetkai/functionary-small-v2.4,MeetKai,MIT,80.50%,76.31%,82.00%,91.50%,56.00%,58.00%,88.50%,82.00%,69.50%,78.24%,96.00%,52.86%,82.00%,80.00%,65.00%,67.92%,N/A,2.43,2.55,7.18
|
21 |
+
20,78.12%,Command-R-Plus (FC) (Optimized),https://txt.cohere.com/command-r-plus-microsoft-azure,Cohere For AI,cc-by-nc-4.0,81.81%,77.17%,78.73%,89.25%,46.00%,60.00%,91.00%,87.50%,70.00%,81.18%,95.00%,61.43%,86.00%,74.00%,67.50%,63.75%,1.12,1.9,1.34,4.0
|
22 |
+
21,76.47%,Claude-3-Opus-20240229 (FC tools-2024-04-04),https://www.anthropic.com/news/claude-3-family,Anthropic,Proprietary,71.01%,71.27%,82.55%,89.25%,61.00%,72.00%,91.50%,58.00%,52.00%,90.59%,97.00%,81.43%,94.00%,38.00%,62.50%,82.50%,30.87,12.92,3.95,20.48
|
23 |
+
22,75.24%,Claude-instant-1.2 (Prompt),https://www.anthropic.com/news/releasing-claude-instant-1-2,Anthropic,Proprietary,76.49%,77.93%,79.45%,86.50%,56.00%,70.00%,85.50%,83.00%,58.00%,84.71%,94.00%,71.43%,80.00%,82.00%,65.00%,57.50%,0.45,1.21,0.69,2.22
|
24 |
+
23,73.29%,Claude-3-Haiku-20240307 (Prompt),https://www.anthropic.com/news/claude-3-family,Anthropic,Proprietary,77.06%,70.49%,84.73%,93.25%,55.00%,76.00%,91.50%,84.00%,48.00%,92.94%,100.00%,82.86%,94.00%,70.00%,25.00%,34.58%,0.18,1.0,0.49,1.72
|
25 |
+
24,70.29%,Claude-2.1 (Prompt),https://www.anthropic.com/news/claude-2-1,Anthropic,Proprietary,63.91%,62.17%,79.64%,88.00%,54.00%,64.00%,76.00%,55.50%,44.50%,71.18%,90.00%,44.29%,84.00%,46.00%,47.50%,83.33%,4.81,3.27,2.13,7.38
|
26 |
+
25,69.47%,Command-R-Plus (FC) (Original),https://txt.cohere.com/command-r-plus-microsoft-azure,Cohere For AI,cc-by-nc-4.0,77.89%,73.19%,74.55%,84.00%,45.00%,58.00%,90.00%,82.00%,65.00%,81.76%,92.00%,67.14%,88.00%,68.00%,55.00%,24.17%,1.09,1.9,0.99,3.99
|
27 |
+
26,67.59%,Mistral-large-2402 (FC Auto),https://docs.mistral.ai/guides/model-selection/,Mistral AI,Proprietary,62.39%,60.01%,66.55%,89.00%,5.00%,10.00%,94.50%,25.50%,63.00%,83.53%,99.00%,61.43%,96.00%,8.00%,52.50%,84.17%,2.47,3.02,2.94,8.85
|
28 |
+
27,66.29%,Gemini-1.0-Pro-001 (FC),https://deepmind.google/technologies/gemini/#introduction,Google,Proprietary,55.51%,56.74%,78.55%,92.25%,42.00%,42.00%,92.00%,30.00%,21.50%,86.47%,89.00%,82.86%,84.00%,44.00%,12.50%,80.00%,0.13,1.27,1.0,3.39
|
29 |
+
28,64.88%,DBRX-Instruct (Prompt),https://www.databricks.com/blog/introducing-dbrx-new-state-art-open-llm,Databricks,Databricks Open Model,64.50%,74.92%,64.00%,75.75%,30.00%,38.00%,71.50%,72.00%,50.50%,71.18%,80.00%,58.57%,86.00%,80.00%,62.50%,55.83%,1.25,0.64,0.41,1.34
|
30 |
+
29,63.88%,Snowflake/snowflake-arctic-instruct (Prompt),https://huggingface.co/Snowflake/snowflake-arctic-instruct,Snowflake,apache-2.0,58.42%,80.04%,62.18%,67.25%,42.00%,62.00%,69.00%,59.00%,43.50%,87.65%,91.00%,82.86%,86.00%,74.00%,72.50%,59.58%,N/A,0.98,0.56,2.13
|
31 |
+
30,63.06%,Mistral-large-2402 (FC Any),https://docs.mistral.ai/guides/model-selection/,Mistral AI,Proprietary,68.82%,64.93%,81.27%,89.25%,62.00%,56.00%,93.50%,31.50%,69.00%,94.71%,95.00%,94.29%,92.00%,8.00%,65.00%,0.00%,1.97,2.07,1.33,4.97
|
32 |
+
31,62.47%,GPT-3.5-Turbo-0125 (FC),https://platform.openai.com/docs/models/gpt-3-5-turbo,OpenAI,Proprietary,71.82%,81.38%,61.27%,63.25%,53.00%,62.00%,66.00%,90.00%,70.00%,93.53%,95.00%,91.43%,80.00%,82.00%,70.00%,2.08%,0.19,1.27,0.74,2.47
|
33 |
+
32,59.94%,Mistral-small-2402 (FC Any),https://docs.mistral.ai/guides/model-selection/,Mistral AI,Proprietary,64.48%,52.62%,80.91%,90.00%,56.00%,58.00%,96.00%,39.00%,42.00%,96.47%,100.00%,91.43%,92.00%,12.00%,10.00%,0.00%,0.48,1.14,0.81,2.52
|
34 |
+
33,59.18%,Claude-3-Sonnet-20240229 (FC tools-2024-04-04),https://www.anthropic.com/news/claude-3-family,Anthropic,Proprietary,44.14%,43.32%,76.55%,85.75%,49.00%,58.00%,88.00%,6.00%,6.00%,85.29%,96.00%,70.00%,88.00%,0.00%,0.00%,81.67%,3.44,3.25,1.46,6.85
|
35 |
+
34,58.88%,Meta-Llama-3-8B-Instruct (Prompt),https://llama.meta.com/llama3,Meta,Meta Llama 3 Community,59.25%,70.01%,58.00%,62.00%,44.00%,54.00%,72.50%,58.50%,48.00%,67.06%,67.00%,67.14%,82.00%,66.00%,65.00%,45.83%,0.24,0.04,N/A,N/A
|
36 |
+
35,58.53%,Hermes-2-Pro-Mistral-7B (FC),https://huggingface.co/NousResearch/Hermes-2-Pro-Mistral-7B,NousResearch,apache-2.0,67.83%,55.62%,71.82%,81.50%,42.00%,54.00%,80.50%,66.50%,52.50%,56.47%,78.00%,25.71%,70.00%,56.00%,40.00%,10.83%,0.49,0.08,N/A,N/A
|
37 |
+
36,53.71%,Claude-3-Haiku-20240307 (FC tools-2024-04-04),https://www.anthropic.com/news/claude-3-family,Anthropic,Proprietary,44.95%,46.79%,85.82%,95.00%,60.00%,64.00%,93.50%,0.50%,0.00%,91.18%,96.00%,84.29%,94.00%,2.00%,0.00%,20.83%,0.29,1.49,0.61,2.4
|
38 |
+
37,53.59%,FireFunction-v1 (FC),https://huggingface.co/fireworks-ai/firefunction-v1,Fireworks,Apache 2.0,40.70%,39.79%,69.82%,90.00%,13.00%,22.00%,93.00%,0.00%,0.00%,71.18%,95.00%,37.14%,88.00%,0.00%,0.00%,73.33%,N/A,1.69,1.53,4.61
|
39 |
+
38,53.53%,GPT-4-0613 (FC),https://platform.openai.com/docs/models/gpt-4-and-gpt-4-turbo,OpenAI,Proprietary,39.16%,38.53%,63.64%,86.25%,4.00%,2.00%,93.00%,0.00%,0.00%,64.12%,95.00%,20.00%,90.00%,0.00%,0.00%,91.67%,10.37,3.49,3.27,10.88
|
40 |
+
39,51.94%,Mistral-tiny-2312 (Prompt),https://docs.mistral.ai/guides/model-selection/,Mistral AI,Proprietary,48.24%,36.16%,55.45%,69.75%,26.00%,0.00%,56.50%,47.50%,33.50%,27.65%,46.00%,1.43%,20.00%,62.00%,35.00%,83.75%,0.13,1.45,1.41,4.39
|
41 |
+
40,51.71%,Nexusflow-Raven-v2 (FC),https://huggingface.co/Nexusflow/NexusRaven-V2-13B,Nexusflow,Apache 2.0,54.73%,61.78%,68.91%,74.25%,52.00%,60.00%,76.00%,30.50%,43.50%,64.12%,93.00%,22.86%,82.00%,46.00%,55.00%,2.08%,N/A,1.83,1.36,4.43
|
42 |
+
41,42.65%,Gemma-7b-it (Prompt),https://blog.google/technology/developers/gemma-open-models/,Google,gemma-terms-of-use,38.88%,31.75%,42.00%,47.50%,29.00%,24.00%,48.00%,30.00%,35.50%,30.00%,44.00%,10.00%,32.00%,40.00%,25.00%,70.83%,0.37,0.06,N/A,N/A
|
43 |
+
42,39.82%,Deepseek-v1.5 (Prompt),https://huggingface.co/deepseek-ai/deepseek-coder-7b-instruct-v1.5,Deepseek,Deepseek License,37.27%,30.89%,39.09%,49.75%,4.00%,24.00%,49.00%,37.00%,24.00%,37.06%,38.00%,35.71%,38.00%,36.00%,12.50%,57.08%,3.24,0.53,N/A,N/A
|
44 |
+
43,39.59%,Mistral-Small-2402 (Prompt),https://docs.mistral.ai/guides/model-selection/,Mistral AI,Proprietary,37.83%,38.03%,5.82%,6.00%,6.00%,4.00%,8.00%,79.00%,58.50%,34.12%,6.00%,74.29%,20.00%,68.00%,30.00%,98.33%,0.64,1.11,0.95,3.03
|
45 |
+
44,23.71%,Mistral-small-2402 (FC Auto),https://docs.mistral.ai/guides/model-selection/,Mistral AI,Proprietary,2.62%,34.37%,2.00%,2.75%,0.00%,0.00%,2.50%,3.00%,3.00%,56.47%,79.00%,24.29%,70.00%,6.00%,5.00%,99.58%,0.97,3.06,1.8,6.23
|