Update src/about.py
Browse files- src/about.py +4 -4
src/about.py
CHANGED
@@ -50,7 +50,7 @@ When you submit a model on the "Submit here!" page, it is automatically evaluate
|
|
50 |
|
51 |
The GPU used for evaluation is operated with the support of __[Technology Innovation Institute (TII)](https://www.tii.ae/)__.
|
52 |
|
53 |
-
The datasets used for evaluation
|
54 |
|
55 |
More details about the benchmarks and the evaluation process is provided on the “About” page.
|
56 |
"""
|
@@ -79,19 +79,19 @@ Note : Some models might get selected as a subject of caution by the community,
|
|
79 |
|
80 |
## How it works
|
81 |
📈 We evaluate models using the impressive [LightEval](https://github.com/huggingface/lighteval), a unified and straightforward framework from the HuggingFace Eval Team to test and assess causal language models on a large number of different evaluation tasks.
|
82 |
-
We have set up a benchmark using datasets, most of them translated to Arabic, and validated by native
|
83 |
|
84 |
Find below the Native benchmarks :
|
85 |
|
86 |
- AlGhafa : Find more details [here](https://aclanthology.org/2023.arabicnlp-1.21.pdf) - (provided by [TII](https://www.tii.ae/))
|
87 |
-
- Arabic-Culture-Value-
|
88 |
|
89 |
|
90 |
And here find all the translated benchmarks provided by the Language evaluation team at [Technology Innovation Institute](https://www.tii.ae/) :
|
91 |
|
92 |
- `Arabic-MMLU`, `Arabic-EXAMS`, `Arabic-ARC-Challenge`, `Arabic-ARC-Easy`, `Arabic-BOOLQ`, `Arabic-COPA`, `Arabic-HELLASWAG`, `Arabic-OPENBOOK-QA`, `Arabic-PIQA`, `Arabic-RACE`, `Arabic-SCIQ`, `Arabic-TOXIGEN`. All part of the extended version of the AlGhafa benchmark (AlGhafa-T version)
|
93 |
|
94 |
-
Please, consider reaching out to us through
|
95 |
|
96 |
GPUs are provided by __[Technology Innovation Institute (TII)](https://www.tii.ae/)__ for the evaluations.
|
97 |
|
|
|
50 |
|
51 |
The GPU used for evaluation is operated with the support of __[Technology Innovation Institute (TII)](https://www.tii.ae/)__.
|
52 |
|
53 |
+
The datasets used for evaluation consist of datasets that are Arabic Native like the `AlGhafa` benchmark from [TII](https://www.tii.ae/) and `ACVA` benchmark from [FreedomIntelligence](https://huggingface.co/FreedomIntelligence) to assess reasoning, language understanding, commonsense, and more.
|
54 |
|
55 |
More details about the benchmarks and the evaluation process is provided on the “About” page.
|
56 |
"""
|
|
|
79 |
|
80 |
## How it works
|
81 |
📈 We evaluate models using the impressive [LightEval](https://github.com/huggingface/lighteval), a unified and straightforward framework from the HuggingFace Eval Team to test and assess causal language models on a large number of different evaluation tasks.
|
82 |
+
We have set up a benchmark using datasets, most of them translated to Arabic, and validated by native Arabic speakers. We also added `AlGhafa`, a new benchmark prepared from scratch natively for Arabic, alongside the `ACVA` benchmark introduced in the [AceGPT](https://arxiv.org/abs/2309.12053) paper by [FreedomIntelligence](https://huggingface.co/FreedomIntelligence).
|
83 |
|
84 |
Find below the Native benchmarks :
|
85 |
|
86 |
- AlGhafa : Find more details [here](https://aclanthology.org/2023.arabicnlp-1.21.pdf) - (provided by [TII](https://www.tii.ae/))
|
87 |
+
- Arabic-Culture-Value-Alignment (ACVA) : Find more details [here](https://arxiv.org/pdf/2309.12053) - (provided by [FreedomIntelligence](https://huggingface.co/FreedomIntelligence))
|
88 |
|
89 |
|
90 |
And here find all the translated benchmarks provided by the Language evaluation team at [Technology Innovation Institute](https://www.tii.ae/) :
|
91 |
|
92 |
- `Arabic-MMLU`, `Arabic-EXAMS`, `Arabic-ARC-Challenge`, `Arabic-ARC-Easy`, `Arabic-BOOLQ`, `Arabic-COPA`, `Arabic-HELLASWAG`, `Arabic-OPENBOOK-QA`, `Arabic-PIQA`, `Arabic-RACE`, `Arabic-SCIQ`, `Arabic-TOXIGEN`. All part of the extended version of the AlGhafa benchmark (AlGhafa-T version)
|
93 |
|
94 |
+
Please, consider reaching out to us through the discussions tab if you are working on benchmarks for Arabic LLMs and willing to see them on this leaderboard as well. Your benchmark might change the whole game for Arabic models !
|
95 |
|
96 |
GPUs are provided by __[Technology Innovation Institute (TII)](https://www.tii.ae/)__ for the evaluations.
|
97 |
|