Update src/display/about.py
Browse files- src/display/about.py +21 -10
src/display/about.py
CHANGED
@@ -55,16 +55,27 @@ For more information on the included benchmarks and instructions on evaluating y
|
|
55 |
|
56 |
# Which evaluations are you running? how can people reproduce what you have?
|
57 |
LLM_BENCHMARKS_TEXT = f"""
|
58 |
-
##
|
59 |
-
|
60 |
-
|
61 |
-
|
62 |
-
|
63 |
-
|
64 |
-
|
65 |
-
|
66 |
-
|
67 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
68 |
|
69 |
## Evaluation Process
|
70 |
|
|
|
55 |
|
56 |
# Which evaluations are you running? how can people reproduce what you have?
|
57 |
LLM_BENCHMARKS_TEXT = f"""
|
58 |
+
## Included benchmarks
|
59 |
+
|
60 |
+
All currently supported benchmarks are listed in the table below:
|
61 |
+
|
62 |
+
| Dataset | Language | Task type | Metrics | Samples | Task ID |
|
63 |
+
| ------------------------------------------------------------ | ----------------------------- | -------------------------- | -------------- | ------: | --------------- |
|
64 |
+
| [AGREE](https://github.com/jirkoada/czechbench_eval_harness/tree/main/lm_eval/tasks/czechbenchagree_cs) | CS (Original) | Subject-verb agreement | Acc | 627 | agree_cs |
|
65 |
+
| [ANLI](https://github.com/jirkoada/czechbench_eval_harness/tree/main/lm_eval/tasks/czechbenchanli_cs) | CS (Translated) | Natural Language Inference | Acc, Macro F1 | 1200 | anli_cs |
|
66 |
+
| [ARC Challenge](https://github.com/jirkoada/czechbench_eval_harness/tree/main/lm_eval/tasks/czechbencharc_cs) | CS (Translated) | Knowledge-Based QA | Acc | 1172 | arc_cs |
|
67 |
+
| [ARC Easy](https://github.com/jirkoada/czechbench_eval_harness/tree/main/lm_eval/tasks/czechbencharc_cs) | CS (Translated) | Knowledge-Based QA | Acc | 2376 | arc_cs |
|
68 |
+
| [Belebele](https://github.com/jirkoada/czechbench_eval_harness/tree/main/lm_eval/tasks/czechbenchbelebele_cs) | CS (Professional translation) | Reading Comprehension / QA | Acc | 895 | belebele_cs |
|
69 |
+
| [CTKFacts](https://github.com/jirkoada/czechbench_eval_harness/tree/main/lm_eval/tasks/czechbenchctkfacts_cs) | CS (Original) | Natural Language Inference | Acc, Macro F1 | 558 | ctkfacts_cs |
|
70 |
+
| [Czech News](https://github.com/jirkoada/czechbench_eval_harness/tree/main/lm_eval/tasks/czechbenchczechnews_cs) | CS (Original) | News Topic Classification | Acc, Macro F1 | 1000 | czechnews_cs |
|
71 |
+
| [Facebook Comments](https://github.com/jirkoada/czechbench_eval_harness/tree/main/lm_eval/tasks/czechbenchfb_comments_cs) | CS (Original) | Sentiment Analysis | Acc, Macro F1 | 1000 | fb_comments_cs |
|
72 |
+
| [GSM8K](https://github.com/jirkoada/czechbench_eval_harness/tree/main/lm_eval/tasks/czechbenchgsm8k_cs) | CS (Translated) | Mathematical inference | EM Acc | 1319 | gsm8k_cs |
|
73 |
+
| [Klokánek](https://github.com/jirkoada/czechbench_eval_harness/tree/main/lm_eval/tasks/czechbenchklokanek_cs) | CS (Original) | Math/Logical Inference | Acc | 808 | klokanek_cs |
|
74 |
+
| [Mall Reviews](https://github.com/jirkoada/czechbench_eval_harness/tree/main/lm_eval/tasks/czechbenchmall_reviews_cs) | CS (Original) | Sentiment Analysis | Acc, Macro F1 | 3000 | mall_reviews_cs |
|
75 |
+
| [MMLU](https://github.com/jirkoada/czechbench_eval_harness/tree/main/lm_eval/tasks/czechbenchmmlu_cs) | CS (Translated) | Knowledge-Based QA | Acc | 12408 | mmlu_cs |
|
76 |
+
| [SQAD](https://github.com/jirkoada/czechbench_eval_harness/tree/main/lm_eval/tasks/czechbenchsqad_cs) | CS (Original) | Reading Comprehension / QA | EM Acc, BoW F1 | 843 | sqad_cs |
|
77 |
+
| [Subjectivity](https://github.com/jirkoada/czechbench_eval_harness/tree/main/lm_eval/tasks/czechbenchsubjectivity_cs) | CS (Original) | Subjectivity Analysis | Acc, Macro F1 | 2000 | subjectivity_cs |
|
78 |
+
| [TruthfulQA](https://github.com/jirkoada/czechbench_eval_harness/tree/main/lm_eval/tasks/czechbenchtruthfulqa_cs) | CS (Translated) | Knowledge-Based QA | Acc | 813 | truthfulqa_cs |
|
79 |
|
80 |
## Evaluation Process
|
81 |
|