Spaces:

bardsai
/

performance-llm-board

Sleeping

piotr-szleg-bards-ai commited on Feb 22, 2024

Commit

12a8a04

1 Parent(s): 3ea6115

2024-02-22 22:53:43 Publish script update

Files changed (1) hide show

app.py CHANGED Viewed

@@ -15,7 +15,7 @@ Projects compares different large language models and their providers for real t
 While other benchmarks compare LLMs on different human intelligence tasks this benchmark focus on features related to business and engineering aspects such as response times, pricing and data streaming capabilities.
 To preform evaluation we chose a task of newspaper articles summarization from [GEM/xlsum](https://huggingface.co/datasets/GEM/xlsum) dataset as it represents a very standard type of task where model has to understand unstructured natural language text, process it and output text in a specified format.
-For this version we chose English, Polish and Japanese languages, with Japanese representing languages using logographic alphabets. This enable us also validate the effectiveness of the LLM for different language groups.
 Each of the models was asked to summarize the text using the following prompt:

 While other benchmarks compare LLMs on different human intelligence tasks this benchmark focus on features related to business and engineering aspects such as response times, pricing and data streaming capabilities.
 To preform evaluation we chose a task of newspaper articles summarization from [GEM/xlsum](https://huggingface.co/datasets/GEM/xlsum) dataset as it represents a very standard type of task where model has to understand unstructured natural language text, process it and output text in a specified format.
+For this version we chose English and Japanese languages, with Japanese representing languages using logographic alphabets. This enable us also validate the effectiveness of the LLM for different language groups.
 Each of the models was asked to summarize the text using the following prompt: