Spaces:
Running
Running
tier,model,FactBench,Reddit,Overall | |
F1,GPT4o,80.92,27.45,64.35 | |
F1,Claude 3.5-Sonnet,75.68,26.31,59.67 | |
F1,Gemini 1.5-Flash,77.38,28.68,61.63 | |
F1,Mistral-7B,62.30,21.71,48.63 | |
F1,Mistral-24B,70.84,28.36,56.46 | |
F1,Mistral-123B,75.20,27.33,59.49 | |
F1,Llama3.1-8b,60.48,20.70,46.89 | |
F1,Llama3.1-70b,64.80,23.90,51.59 | |
F1,Llama3.1-405B,73.23,23.15,57.54 | |
F1,Qwen2.5-8b,66.25,22.86,52.39 | |
F1,Qwen2.5-32b,72.25,27.88,57.52 | |
F1,Qwen2.5-72B,73.09,26.95,57.82 | |
Recall,GPT4o,77.13,16.54,52.42 | |
Recall,Claude 3.5-Sonnet,69.35,15.94,47.57 | |
Recall,Gemini 1.5-Flash,70.71,17.50,49.01 | |
Recall,Mistral-7B,51.96,12.82,36.00 | |
Recall,Mistral-24B,61.48,17.46,43.53 | |
Recall,Mistral-123B,67.28,16.57,46.60 | |
Recall,Llama3.1-8b,54.28,12.73,37.33 | |
Recall,Llama3.1-70b,58.00,14.42,40.23 | |
Recall,Llama3.1-405B,68.40,13.75,46.11 | |
Recall,Qwen2.5-8b,58.66,13.53,40.25 | |
Recall,Qwen2.5-32b,62.77,16.91,44.07 | |
Recall,Qwen2.5-72B,64.12,16.29,44.61 | |
Precision,GPT4o,85.11,80.66,83.30 | |
Precision,Claude 3.5-Sonnet,83.28,75.35,80.05 | |
Precision,Gemini 1.5-Flash,85.45,79.48,83.02 | |
Precision,Mistral-7B,77.79,70.72,74.91 | |
Precision,Mistral-24B,83.61,75.51,80.31 | |
Precision,Mistral-123B,85.24,77.88,82.24 | |
Precision,Llama3.1-8b,68.27,55.40,63.02 | |
Precision,Llama3.1-70b,73.40,69.72,71.90 | |
Precision,Llama3.1-405B,78.80,73.19,76.51 | |
Precision,Qwen2.5-8b,76.09,73.64,75.09 | |
Precision,Qwen2.5-32b,85.11,79.44,82.80 | |
Precision,Qwen2.5-72B,84.97,78.02,82.14 |