Peter Kruger's picture

3 1 2

Peter Kruger PRO

PeterKruger

·

AI & ML interests

Neural networks (since 1993), LLMs, AI-based financial analysis, LLM Benchmarks

Recent Activity

updated a Space 7 days ago

AutoBench/README

updated a model 7 days ago

AutoBench/AutoBench_1.0

commented on their article 9 days ago

Escape the Benchmark Trap: AutoBench – the Collective-LLM-as-a-Judge System for Evaluating AI models (ASI-Ready!)

View all activity

Organizations

PeterKruger's activity

updated a Space 7 days ago

README

updated a model 7 days ago

AutoBench/AutoBench_1.0

Updated 7 days ago • 2

commented on Escape the Benchmark Trap: AutoBench – the Collective-LLM-as-a-Judge System for Evaluating AI models (ASI-Ready!) 9 days ago

Nice and fully accurate. Excellent job. Thanks!

New activity in AutoBench/AutoBench_1.0 10 days ago

Comparing with mt-bench

#3 opened 10 days ago by

posted an update 10 days ago

Post

443

AutoBench 1.0 is live. The Collective-LLM-as-a-Judge model benchmark
https://huggingface.co/blog/PeterKruger/autobench

New activity in AutoBench/AutoBench_1.0 10 days ago

Pool LLM bias

#2 opened 10 days ago by

Prompt analysis should be better discussed

#1 opened 10 days ago by

upvoted an article 10 days ago

Article

Escape the Benchmark Trap: AutoBench – the Collective-LLM-as-a-Judge System for Evaluating AI models (ASI-Ready!)

By

•

10 days ago

• 6

liked a Space 10 days ago

AutoBench 1.0 Demo

Collective-Model-As-Judge LLM Benchmark

liked a model 10 days ago

AutoBench/AutoBench_1.0

Updated 7 days ago • 2

updated a Space 10 days ago

AutoBench 1.0 Demo

Collective-Model-As-Judge LLM Benchmark

published an article 10 days ago

Article

Escape the Benchmark Trap: AutoBench – the Collective-LLM-as-a-Judge System for Evaluating AI models (ASI-Ready!)

By

•

10 days ago

• 6

updated a dataset 10 days ago

AutoBench/AutoBench_Results_20_LLMs

Preview • Updated 10 days ago • 67

published a dataset 10 days ago

AutoBench/AutoBench_Results_20_LLMs

Preview • Updated 10 days ago • 67

published 2 Spaces 10 days ago

README

AutoBench 1.0 Demo

Collective-Model-As-Judge LLM Benchmark

published a model 10 days ago

AutoBench/AutoBench_1.0

Updated 7 days ago • 2

updated a Space 11 days ago

AutoBench 1.0 Demo

Collective-Model-As-Judge LLM Benchmark

updated 2 models 11 days ago

AutoBench/AutoBench_1.0

Updated 7 days ago • 2

AutoBench/AutoBench_1.0

Updated 7 days ago • 2