LAUNCH Lab

university

https://launch.eecs.umich.edu/

launchnlp

Activity Feed

AI & ML interests

Factuality, reasoning, alignment, LLM applications

Recent Activity

farimafatahi authored a paper about 1 month ago

FactBench: A Dynamic Benchmark for In-the-Wild Language Model Factuality Evaluation

farimafatahi authored a paper about 1 month ago

Logit Arithmetic Elicits Long Reasoning Capabilities Without Training

farimafatahi authored a paper about 1 month ago

From Proof to Program: Characterizing Tool-Induced Reasoning Hallucinations in Large Language Models

View all activity

Collections 1

spaces 5

FactRBench

🏆

View and analyze long-form factuality leaderboard

ExpertLongBench

🚀

Leaderboard for ExpertLongBench

ManyICLBench

🚀

Leaderboard for ManyICLBench

MLRC-BENCH

📊

Display model performance rankings

Factbench

📈

View and compare language model factuality scores

models 4

datasets 12

launch/ExpertLongBench

Preview • Updated Jul 30 • 513 • 10

launch/thinkprm-1K-verification-cots

Viewer • Updated Jul 1 • 1k • 66 • 6

launch/ManyICLBench

Viewer • Updated Jun 26 • 66 • 893 • 1

launch/CMV

Viewer • Updated Jun 26 • 133 • 10

launch/FactRBench

Viewer • Updated Jun 9 • 1.06k • 81 • 1

launch/FactBench

Viewer • Updated Jun 9 • 1k • 171 • 3

launch/CLASH

Viewer • Updated Apr 16 • 345 • 84 • 2

launch/gov_report

Viewer • Updated Nov 9, 2022 • 58.4k • 406 • 7

launch/gov_report_qs

Viewer • Updated Nov 9, 2022 • 7.87k • 342 • 4

launch/open_question_type

Viewer • Updated Nov 9, 2022 • 4.96k • 1.06k • 6

View 12 datasets

LAUNCH Lab

AI & ML interests

Recent Activity

Collections 1

launch/ThinkPRM-1.5B

launch/ThinkPRM-7B

launch/ThinkPRM-14B

mradermacher/ThinkPRM-7B-i1-GGUF

launch/ThinkPRM-1.5B

launch/ThinkPRM-7B

launch/ThinkPRM-14B

mradermacher/ThinkPRM-7B-i1-GGUF

spaces 5

FactRBench

ExpertLongBench

ManyICLBench

MLRC-BENCH

Factbench

models 4

launch/ThinkPRM-14B

launch/ThinkPRM-1.5B

launch/ThinkPRM-7B

launch/POLITICS

datasets 12

launch/ExpertLongBench

launch/thinkprm-1K-verification-cots

launch/ManyICLBench

launch/CMV

launch/FactRBench

launch/FactBench

launch/CLASH

launch/gov_report

launch/gov_report_qs

launch/open_question_type

AI & ML interests

Recent Activity

Team members 16

Collections 1

spaces 5 Sort: Recently updated

FactRBench

ExpertLongBench

ManyICLBench

MLRC-BENCH

Factbench

models 4 Sort: Recently updated

datasets 12 Sort: Recently updated

spaces 5

models 4

datasets 12