Model	Open Ended VQA: % Human Rating	Multiple Choice VQA: % Accuracy	Hints-Multiple Choice VQA: % Accuracy 	Attributions-Multiple Choice VQA: % Accuracy 	Refernce Based-Automatic Evaluation: Accuracy of Judge Prediction Compared to Human Ratings	Refernce Free-Automatic Evaluation: Accuracy of Judge Prediction Compared to Human Ratings	Automatic Evaluation: % Auto-Rater Ratings	Hints-Automatic Evaluation: % Auto-Rater Ratings	Attributions-Automatic Evaluation: % Auto-Rater Ratings
Humans	82						78		
Gemini Pro 1.5	40	38	66	72	87	52	53	62	29
Gemini Pro Vision	30	41	62		75	38	34	47	
GPT4	34	45	69	82	86	51	38	61	25
LlaVA-1.6-34B	15	24	30		76	43	21	16	
LlaVA-1.5-7B	13	17	29		70	35	19	30	
InstructBlip	13						20	28	
Gemini Pro 1.5 Caption _ Gemini Pro 1.5	23								
Human (Oracle) Caption _ Gemini Pro 1.5	50								
Claude 3.5 Sonnet		46	45				39		
GPT4o		55	83				50		
Qwen-VL-Max		35	53				26		
Molmo-7B		34	42				36