xeon27 commited on
Commit
159e996
·
1 Parent(s): fcd47ae

[WIP] Add task link in description

Browse files
Files changed (1) hide show
  1. src/about.py +2 -2
src/about.py CHANGED
@@ -44,14 +44,14 @@ TITLE = """<h1 align="center" id="space-title">LLM Evaluation Leaderboard</h1>""
44
 
45
  # What does your leaderboard evaluate?
46
  INTRODUCTION_TEXT = """
47
- This leaderboard presents the performance of selected LLM models on a set of tasks. The tasks are divided into two categories: base and agentic. The base tasks are: [ARC-Easy]("https://github.com/UKGovernmentBEIS/inspect_evals/tree/main/src/inspect_evals/arc"), ARC-Challenge, DROP, WinoGrande, GSM8K, HellaSwag, HumanEval, IFEval, MATH, MMLU, MMLU-Pro, GPQA-Diamond. The agentic tasks are GAIA and GDM-InterCode-CTF.
48
  """
49
 
50
  # Which evaluations are you running? how can people reproduce what you have?
51
  LLM_BENCHMARKS_TEXT = f"""
52
  ## How it works
53
  The following benchmarks are included:
54
- Base: ARC-Easy, ARC-Challenge, DROP, WinoGrande, GSM8K, HellaSwag, HumanEval, IFEval, MATH, MMLU, MMLU-Pro, GPQA-Diamond
55
  Agentic: GAIA, GDM-InterCode-CTF
56
 
57
  ## Reproducibility
 
44
 
45
  # What does your leaderboard evaluate?
46
  INTRODUCTION_TEXT = """
47
+ This leaderboard presents the performance of selected LLM models on a set of tasks. The tasks are divided into two categories: base and agentic. The base tasks are: [ARC-Easy](https://github.com/UKGovernmentBEIS/inspect_evals/tree/main/src/inspect_evals/arc), ARC-Challenge, DROP, WinoGrande, GSM8K, HellaSwag, HumanEval, IFEval, MATH, MMLU, MMLU-Pro, GPQA-Diamond. The agentic tasks are GAIA and GDM-InterCode-CTF.
48
  """
49
 
50
  # Which evaluations are you running? how can people reproduce what you have?
51
  LLM_BENCHMARKS_TEXT = f"""
52
  ## How it works
53
  The following benchmarks are included:
54
+ Base: [ARC-Easy](https://github.com/UKGovernmentBEIS/inspect_evals/tree/main/src/inspect_evals/arc), ARC-Challenge, DROP, WinoGrande, GSM8K, HellaSwag, HumanEval, IFEval, MATH, MMLU, MMLU-Pro, GPQA-Diamond
55
  Agentic: GAIA, GDM-InterCode-CTF
56
 
57
  ## Reproducibility