shezamunir commited on
Commit
7e2dde3
·
1 Parent(s): 5306259

Create verifact_data.csv

Browse files
Files changed (1) hide show
  1. verifact_data.csv +25 -0
verifact_data.csv ADDED
@@ -0,0 +1,25 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ tier,model,f1,precision,recall
2
+ Overall,GPT4o,67.41,80.59,57.93
3
+ FactBench,GPT4o,80.93,85.11,77.13
4
+ Reddit,GPT4o,42.76,74.04,30.06
5
+ Overall,Claude 3.5-Sonnet,63.65,78.37,53.58
6
+ FactBench,Claude 3.5-Sonnet,75.68,83.28,69.35
7
+ Reddit,Claude 3.5-Sonnet,42.90,71.25,30.69
8
+ Overall,Gemini 1.5-Flash,64.10,80.72,53.16
9
+ FactBench,Gemini 1.5-Flash,77.38,85.45,70.71
10
+ Reddit,Gemini 1.5-Flash,40.26,73.87,27.67
11
+ Overall,Llama3.1-8b,48.62,60.91,40.46
12
+ FactBench,Llama3.1-8b,60.71,68.87,54.28
13
+ Reddit,Llama3.1-8b,28.86,49.36,20.39
14
+ Overall,Llama3.1-70b,55.12,68.09,46.30
15
+ FactBench,Llama3.1-70b,65.83,76.05,58.00
16
+ Reddit,Llama3.1-70b,38.61,56.54,29.31
17
+ Overall,Llama3.1-405B,60.61,72.80,51.92
18
+ FactBench,Llama3.1-405B,73.23,78.80,68.40
19
+ Reddit,Llama3.1-405B,38.98,64.10,28.00
20
+ Overall,Qwen2.5-8b,55.78,72.45,45.34
21
+ FactBench,Qwen2.5-8b,69.23,77.18,58.66
22
+ Reddit,Qwen2.5-8b,37.25,65.58,26.01
23
+ Overall,Qwen2.5-32b,60.00,77.79,47.52
24
+ FactBench,Qwen2.5-32b,71.31,82.74,62.77
25
+ Reddit,Qwen2.5-32b,37.34,70.60,25.38