Spaces:
Running
Running
wenhuchen
commited on
Commit
Β·
b6fb488
1
Parent(s):
236a68e
update leaderboard
Browse files
utils.py
CHANGED
@@ -27,17 +27,21 @@ COLUMN_NAMES = MODEL_INFO
|
|
27 |
|
28 |
LEADERBORAD_INTRODUCTION = """# Science Leaderboard
|
29 |
|
30 |
-
|
31 |
π Welcome to the **Science** leaderboard! The leaderboard covers the most popular evaluation for different science subjects including math, phyiscs, biology, chemistry, computer science, finance.
|
32 |
<div style="display: flex; flex-wrap: wrap; align-items: center; gap: 10px;">
|
33 |
</div>
|
34 |
The evaluation set from the following datasets are being included in the leaderboard.
|
35 |
<ul>
|
36 |
-
<li> MATH: this contains the test set of 5000 questions from American Math contest covering different fields like algebra, calculus, statistics, geometry, linear algebra, number theory.
|
37 |
-
<li> GSM8K: this contains the test set of 1320 questions from grade school math word problems. This dataset is mainly covering algebra problems.
|
38 |
-
<li> TheoremQA: this contains the test set of 800 questions collected from college-level exams. This covers math, physics, engineering and finance.
|
39 |
-
<li> GPQA: this contains the test of 198 questions from college-level dataset GPQA-diamond. This covers many fields like chemistry, genetics, biology, etc.
|
40 |
</ul>
|
|
|
|
|
|
|
|
|
41 |
<a href='https://hits.seeyoufarm.com'><img src='https://hits.seeyoufarm.com/api/count/incr/badge.svg?url=https%3A%2F%2Fhuggingface.co%2Fspaces%2FTIGER-Lab%2FTheoremQA-Leaderboard&count_bg=%23C7C83D&title_bg=%23555555&icon=&icon_color=%23E7E7E7&title=hits&edge_flat=false'></a>
|
42 |
"""
|
43 |
|
|
|
27 |
|
28 |
LEADERBORAD_INTRODUCTION = """# Science Leaderboard
|
29 |
|
30 |
+
**"Which large language model is the BEST on scinece and engineering?"**<br>
|
31 |
π Welcome to the **Science** leaderboard! The leaderboard covers the most popular evaluation for different science subjects including math, phyiscs, biology, chemistry, computer science, finance.
|
32 |
<div style="display: flex; flex-wrap: wrap; align-items: center; gap: 10px;">
|
33 |
</div>
|
34 |
The evaluation set from the following datasets are being included in the leaderboard.
|
35 |
<ul>
|
36 |
+
<li> MATH (4-shot): this contains the test set of 5000 questions from American Math contest covering different fields like algebra, calculus, statistics, geometry, linear algebra, number theory.
|
37 |
+
<li> GSM8K (4-shot): this contains the test set of 1320 questions from grade school math word problems. This dataset is mainly covering algebra problems.
|
38 |
+
<li> TheoremQA (5-shot): this contains the test set of 800 questions collected from college-level exams. This covers math, physics, engineering and finance.
|
39 |
+
<li> GPQA (5-shot): this contains the test of 198 questions from college-level dataset GPQA-diamond. This covers many fields like chemistry, genetics, biology, etc.
|
40 |
</ul>
|
41 |
+
|
42 |
+
**"How to evaluate your model and submit your results?"**<br>
|
43 |
+
Please refer to the guideline in <a href="https://github.com/TIGER-AI-Lab/MAmmoTH/blob/main/math_eval/README.md">Github</a> to evaluate your own model.
|
44 |
+
|
45 |
<a href='https://hits.seeyoufarm.com'><img src='https://hits.seeyoufarm.com/api/count/incr/badge.svg?url=https%3A%2F%2Fhuggingface.co%2Fspaces%2FTIGER-Lab%2FTheoremQA-Leaderboard&count_bg=%23C7C83D&title_bg=%23555555&icon=&icon_color=%23E7E7E7&title=hits&edge_flat=false'></a>
|
46 |
"""
|
47 |
|