view post Post 443 AutoBench 1.0 is live. The Collective-LLM-as-a-Judge model benchmarkhttps://huggingface.co/blog/PeterKruger/autobench See translation ๐ 1 1 + Reply
view article Article Escape the Benchmark Trap: AutoBench โ the Collective-LLM-as-a-Judge System for Evaluating AI models (ASI-Ready!) By PeterKruger โข 10 days ago โข 6
view article Article Escape the Benchmark Trap: AutoBench โ the Collective-LLM-as-a-Judge System for Evaluating AI models (ASI-Ready!) By PeterKruger โข 10 days ago โข 6