xinliucs commited on
Commit
10ffff3
·
verified ·
1 Parent(s): 783b66b

Update verifact_data.csv

Browse files
Files changed (1) hide show
  1. verifact_data.csv +36 -24
verifact_data.csv CHANGED
@@ -1,25 +1,37 @@
1
  tier,model,FactBench,Reddit,Overall
2
- F1,GPT4o,80.93,42.76,67.41
3
- F1,Claude 3.5-Sonnet,75.68,42.90,63.65
4
- F1,Gemini 1.5-Flash,77.38,40.26,64.10
5
- F1,Llama3.1-8b,60.71,28.86,48.62
6
- F1,Llama3.1-70b,65.83,38.61,55.12
7
- F1,Llama3.1-405B,73.23,38.98,60.61
8
- F1,Qwen2.5-8b,69.23,37.25,55.78
9
- F1,Qwen2.5-32b,71.31,37.34,60.00
10
- Recall,GPT4o,77.13,30.06,57.93
11
- Recall,Claude 3.5-Sonnet,69.35,30.69,53.58
12
- Recall,Gemini 1.5-Flash,70.71,27.67,53.16
13
- Recall,Llama3.1-8b,54.28,20.39,40.46
14
- Recall,Llama3.1-70b,58.00,29.31,46.30
15
- Recall,Llama3.1-405B,68.40,28.00,51.92
16
- Recall,Qwen2.5-8b,58.66,26.01,45.34
17
- Recall,Qwen2.5-32b,62.77,25.38,47.52
18
- Precision,GPT4o,85.11,74.04,80.59
19
- Precision,Claude 3.5-Sonnet,83.28,71.25,78.37
20
- Precision,Gemini 1.5-Flash,85.45,73.87,80.72
21
- Precision,Llama3.1-8b,68.87,49.36,60.91
22
- Precision,Llama3.1-70b,76.05,56.54,68.09
23
- Precision,Llama3.1-405B,78.80,64.10,72.80
24
- Precision,Qwen2.5-8b,77.18,65.58,72.45
25
- Precision,Qwen2.5-32b,82.74,70.60,77.79
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  tier,model,FactBench,Reddit,Overall
2
+ F1,GPT4o,80.92,27.45,64.35
3
+ F1,Claude 3.5-Sonnet,75.68,26.31,59.67
4
+ F1,Gemini 1.5-Flash,77.38,28.68,61.63
5
+ F1,Mistral-7B,62.30,21.71,48.63
6
+ F1,Mistral-24B,70.84,28.36,56.46
7
+ F1,Mistral-123B,75.20,27.33,59.49
8
+ F1,Llama3.1-8b,60.48,20.70,46.89
9
+ F1,Llama3.1-70b,64.80,23.90,51.59
10
+ F1,Llama3.1-405B,73.23,23.15,57.54
11
+ F1,Qwen2.5-8b,66.25,22.86,52.39
12
+ F1,Qwen2.5-32b,72.25,27.88,57.52
13
+ F1,Qwen2.5-72B,73.09,26.95,57.82
14
+ Recall,GPT4o,77.13,16.54,52.42
15
+ Recall,Claude 3.5-Sonnet,69.35,15.94,47.57
16
+ Recall,Gemini 1.5-Flash,70.71,17.50,49.01
17
+ Recall,Mistral-7B,51.96,12.82,36.00
18
+ Recall,Mistral-24B,61.48,17.46,43.53
19
+ Recall,Mistral-123B,67.28,16.57,46.60
20
+ Recall,Llama3.1-8b,54.28,12.73,37.33
21
+ Recall,Llama3.1-70b,58.00,14.42,40.23
22
+ Recall,Llama3.1-405B,68.40,13.75,46.11
23
+ Recall,Qwen2.5-8b,58.66,13.53,40.25
24
+ Recall,Qwen2.5-32b,62.77,16.91,44.07
25
+ Recall,Qwen2.5-72B,64.12,16.29,44.61
26
+ Precision,GPT4o,85.11,80.66,83.30
27
+ Precision,Claude 3.5-Sonnet,83.28,75.35,80.05
28
+ Precision,Gemini 1.5-Flash,85.45,79.48,83.02
29
+ Precision,Mistral-7B,77.79,70.72,74.91
30
+ Precision,Mistral-24B,83.61,75.51,80.31
31
+ Precision,Mistral-123B,85.24,77.88,82.24
32
+ Precision,Llama3.1-8b,68.27,55.40,63.02
33
+ Precision,Llama3.1-70b,73.40,69.72,71.90
34
+ Precision,Llama3.1-405B,78.80,73.19,76.51
35
+ Precision,Qwen2.5-8b,76.09,73.64,75.09
36
+ Precision,Qwen2.5-32b,85.11,79.44,82.80
37
+ Precision,Qwen2.5-72B,84.97,78.02,82.14