Commit History
Upload from GitHub Actions: Translate MMLU and evaluate
		4c5c136
	
		
		verified
	Upload from GitHub Actions: Correlation plot
		b0aa389
	
		
		verified
	Upload from GitHub Actions: Evaluate on autotranslated GSM dataset
		f3a09a2
	
		
		verified
	Upload from GitHub Actions: Evaluate Google Translate
		338dc9b
	
		
		verified
	Upload from GitHub Actions: More models and languages
		a73f888
	
		
		verified
	Upload from GitHub Actions: Eavaluate on 40 languages
		941d5c5
	
		
		verified
	Upload from GitHub Actions: Add math benchmarks
		549360a
	
		
		verified
	Upload from GitHub Actions: More results
		52abc5b
	
		
		verified
	Upload from GitHub Actions: Update model ranking fetching
		f840423
	
		
		verified
	Upload from GitHub Actions: Use FLORES+ via Huggingface
		913253a
	
		
		verified
	Upload from GitHub Actions: More models
		0bd935e
	
		
		verified
	Upload from GitHub Actions: Increase n_models
		d09b095
	
		
		verified
	Upload from GitHub Actions: New results
		b311dd5
	
		
		verified
	Upload from GitHub Actions: Fix vibecoding
		75010c2
	
		
		verified
	Upload from GitHub Actions: Ugly fix for CI errors
		adc94d7
	
		
		verified
	Upload from GitHub Actions: Exclude free models from evals
		c9e9db6
	
		
		verified
	Block gemini-2.5-pro-exp-03-25
		092c06a
	
		
		
	
		David Pomerenke
		
	commited on
		
		
Use most popular current + historical models
		9983b5f
	
		
		
	
		David Pomerenke
		
	commited on
		
		
Only run tasks for which there is no result yet
		2f9dee1
	
		
		
	
		David Pomerenke
		
	commited on
		
		
Run on 40 languages, additional models
		260c1a3
	
		
		
	
		David Pomerenke
		
	commited on
		
		
Run on 15 languages
		f8a3dad
	
		
		
	
		David Pomerenke
		
	commited on
		
		
Update models
		8941a67
	
		
		
	
		David Pomerenke
		
	commited on
		
		
Implement MMLU task
		a683732
	
		
		
	
		David Pomerenke
		
	commited on
		
		
Add Global MMLU benchmark
		ce2acb0
	
		
		
	
		David Pomerenke
		
	commited on
		
		
Run on 100 languages, adjust display
		8274634
	
		
		
	
		David Pomerenke
		
	commited on
		
		
Add Dockerfile
		4d13673
	
		
		
	
		David Pomerenke
		
	commited on
		
		
Language selection checkboxes & filtering in backend
		d91b022
	
		
		
	
		David Pomerenke
		
	commited on
		
		
Basic backend setup with FastApi but without actual filtering
		2c21cf7
	
		
		
	
		David Pomerenke
		
	commited on
		
		
spBLEU tokenizer, run on more languages
		eaf2d97
	
		
		
	
		David Pomerenke
		
	commited on
		
		
Process data for country map
		723f963
	
		
		
	
		David Pomerenke
		
	commited on
		
		
Autonymns and cooler dataset search display
		33469f2
	
		
		
	
		David Pomerenke
		
	commited on
		
		
Nicer layout for datasets table and other tables
		430bde6
	
		
		
	
		David Pomerenke
		
	commited on
		
		
Datasets table
		11c32ae
	
		
		
	
		David Pomerenke
		
	commited on
		
		
Basic language table
		d1a7111
	
		
		
	
		David Pomerenke
		
	commited on
		
		
Params and license metadata from HF API
		3ed02d5
	
		
		
	
		David Pomerenke
		
	commited on
		
		
Refactor eval code into files
		da6e1bc
	
		
		
	
		David Pomerenke
		
	commited on