Spaces:
Running
Running
Commit
·
12a8a04
1
Parent(s):
3ea6115
2024-02-22 22:53:43 Publish script update
Browse files
app.py
CHANGED
@@ -15,7 +15,7 @@ Projects compares different large language models and their providers for real t
|
|
15 |
While other benchmarks compare LLMs on different human intelligence tasks this benchmark focus on features related to business and engineering aspects such as response times, pricing and data streaming capabilities.
|
16 |
|
17 |
To preform evaluation we chose a task of newspaper articles summarization from [GEM/xlsum](https://huggingface.co/datasets/GEM/xlsum) dataset as it represents a very standard type of task where model has to understand unstructured natural language text, process it and output text in a specified format.
|
18 |
-
For this version we chose English
|
19 |
|
20 |
Each of the models was asked to summarize the text using the following prompt:
|
21 |
|
|
|
15 |
While other benchmarks compare LLMs on different human intelligence tasks this benchmark focus on features related to business and engineering aspects such as response times, pricing and data streaming capabilities.
|
16 |
|
17 |
To preform evaluation we chose a task of newspaper articles summarization from [GEM/xlsum](https://huggingface.co/datasets/GEM/xlsum) dataset as it represents a very standard type of task where model has to understand unstructured natural language text, process it and output text in a specified format.
|
18 |
+
For this version we chose English and Japanese languages, with Japanese representing languages using logographic alphabets. This enable us also validate the effectiveness of the LLM for different language groups.
|
19 |
|
20 |
Each of the models was asked to summarize the text using the following prompt:
|
21 |
|