eval-leaderboard

Running

xeon27 commited on Jan 21

Commit

fcd47ae

1 Parent(s): cdca101

[WIP] Add task link in description

Files changed (1) hide show

src/about.py CHANGED Viewed

@@ -44,7 +44,7 @@ TITLE = """<h1 align="center" id="space-title">LLM Evaluation Leaderboard</h1>""
 # What does your leaderboard evaluate?
 INTRODUCTION_TEXT = """
-This leaderboard presents the performance of selected LLM models on a set of tasks. The tasks are divided into two categories: base and agentic. The base tasks are ARC-Easy, ARC-Challenge, DROP, WinoGrande, GSM8K, HellaSwag, HumanEval, IFEval, MATH, MMLU, MMLU-Pro, GPQA-Diamond. The agentic tasks are GAIA and GDM-InterCode-CTF.
 """
 # Which evaluations are you running? how can people reproduce what you have?

 # What does your leaderboard evaluate?
 INTRODUCTION_TEXT = """
+This leaderboard presents the performance of selected LLM models on a set of tasks. The tasks are divided into two categories: base and agentic. The base tasks are: [ARC-Easy]("https://github.com/UKGovernmentBEIS/inspect_evals/tree/main/src/inspect_evals/arc"), ARC-Challenge, DROP, WinoGrande, GSM8K, HellaSwag, HumanEval, IFEval, MATH, MMLU, MMLU-Pro, GPQA-Diamond. The agentic tasks are GAIA and GDM-InterCode-CTF.
 """
 # Which evaluations are you running? how can people reproduce what you have?