|
Model,Row Color,Symbolic,Medium,Hard,1st<50% op,1st<10% op,Avg. Acc op≤30,Average↑,Link |
|
deepseek-r1,yellow,7280.0,9750.85,8573.8,100,130,0.9427,8534.88,https://huggingface.co/deepseek-ai/DeepSeek-V3 |
|
o3-mini,yellow,6690.0,8335.66,5769.97,70,110,0.9423,6931.88,https://openai.com/index/openai-o3-mini/ |
|
o1-mini,yellow,5060.0,6054.91,3738.43,50,90,0.8397,4951.11,https://platform.openai.com/docs/models/o1 |
|
deepseek-v3,None,4310.0,4100.81,2407.86,24,55,0.6669,3606.22,https://huggingface.co/deepseek-ai/DeepSeek-V3 |
|
qwq-32b-preview,yellow,3530.0,3205.75,1846.19,21,50,0.5403,2860.65,https://huggingface.co/Qwen/QwQ-32B-Preview |
|
gemini-1.5-pro-002,None,2547.0,3659.59,2318.28,26,45,0.6924,2841.62,https://aistudio.google.com/app/prompts/new_chat?model=gemini-1.5-pro-002 |
|
claude-3.5-sonnet,None,2161.0,3281.8,2115.79,26,40,0.6758,2519.53,https://www.anthropic.com/news/3-5-models-and-computer-use |
|
mistral-large-2411,None,2332.5,2879.92,2310.49,24,50,0.6645,2507.64,https://huggingface.co/mistralai/Mistral-Large-Instruct-2411 |
|
qwen-2.5-72b-instruct,None,2048.0,2496.81,2016.38,21,40,0.5433,2187.06,https://huggingface.co/Qwen/Qwen2.5-72B-Instruct |
|
gpt-4o-2024-11-20,None,2379.0,2457.37,1451.54,18,30,0.5064,2095.97,https://platform.openai.com/docs/models/gpt-4o#gpt-4o |
|
gemini-1.5-flash-002,None,1970.0,1478.75,1274.25,13,30,0.4460,1574.33,https://aistudio.google.com/app/prompts/new_chat?model=gemini-1.5-flash-002 |
|
llama-3.1-70b-instruct,None,1769.0,1650.25,1205.25,15,30,0.4314,1541.50,https://huggingface.co/meta-llama/Llama-3.1-70B-Instruct |
|
minimax-text-01,green,1618.5,1712.64,1178.51,14,30,0.4213,1503.22,https://huggingface.co/MiniMaxAI/MiniMax-Text-01 |
|
llama-3.1-405b-instruct,None,1557.0,1321.54,950.0,11,20,0.3409,1276.18,https://huggingface.co/meta-llama/Llama-3.1-405B-Instruct |
|
gpt-4o-mini,None,1389.0,1406.5,913.89,12,22,0.3094,1236.46,https://openai.com/index/gpt-4o-mini-advancing-cost-efficient-intelligence/ |
|
claude-3.5-haiku,None,897.0,1053.16,784.34,10,22,0.2910,911.50,https://www.anthropic.com/news/3-5-models-and-computer-use |
|
qwen-2.5-7b-instruct,None,786.95,886.75,618.5,7,19,0.2257,764.07,https://huggingface.co/Qwen/Qwen2.5-7B-Instruct |
|
llama-3.1-8b-instruct,None,462.0,786.5,606.5,6,17,0.2186,618.30,https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct |
|
jamba-1.5-large,blue,856.0,485.13,466.4,6,26,0.1828,602.51,https://huggingface.co/ai21labs/AI21-Jamba-1.5-Large |