Spaces:
Running
Running
Ludwig Stumpp
commited on
Commit
·
9d7638e
1
Parent(s):
8f06941
Add column for publisher
Browse files- README.md +44 -44
- streamlit_app.py +1 -1
README.md
CHANGED
@@ -8,50 +8,50 @@ https://llm-leaderboard.streamlit.app/
|
|
8 |
|
9 |
## Leaderboard
|
10 |
|
11 |
-
| Model Name | Commercial Use? | Chatbot Arena Elo | HellaSwag (few-shot) | HellaSwag (zero-shot) | HumanEval-Python (pass@1) | LAMBADA (zero-shot) | MMLU (zero-shot) | MMLU (few-shot) | TriviaQA (zero-shot) |
|
12 |
-
| ----------------------------------------------------------------------------------------------------------- | --------------- | ------------------------------------------------ | -------------------------------------------------------------------- | --------------------------------------------- | ------------------------------------------------------------------------------- | --------------------------------------------- | ---------------------------------------------------------------------------------------- | -------------------------------------------------------------------- | --------------------------------------------- |
|
13 |
-
| [alpaca-13b](https://crfm.stanford.edu/2023/03/13/alpaca.html) | no | [1008](https://lmsys.org/blog/2023-05-03-arena/) | | | | | | | |
|
14 |
-
| [bloom-176b](https://huggingface.co/bigscience/bloom) | yes | | [0.744](https://crfm.stanford.edu/helm/latest/?group=core_scenarios) | | [0.155](https://huggingface.co/bigscience/bloom#results) | | [0.299](https://crfm.stanford.edu/helm/latest/?group=core_scenarios) | | |
|
15 |
-
| [cerebras-gpt-7b](https://huggingface.co/cerebras/Cerebras-GPT-6.7B) | yes | | | [0.636](https://www.mosaicml.com/blog/mpt-7b) | | [0.636](https://www.mosaicml.com/blog/mpt-7b) | [0.259](https://www.mosaicml.com/blog/mpt-7b) | | [0.141](https://www.mosaicml.com/blog/mpt-7b) |
|
16 |
-
| [cerebras-gpt-13b](https://huggingface.co/cerebras/Cerebras-GPT-13B) | yes | | | [0.635](https://www.mosaicml.com/blog/mpt-7b) | | [0.635](https://www.mosaicml.com/blog/mpt-7b) | [0.258](https://www.mosaicml.com/blog/mpt-7b) | | [0.146](https://www.mosaicml.com/blog/mpt-7b) |
|
17 |
-
| [chatglm-6b](https://chatglm.cn/blog) | yes | [985](https://lmsys.org/blog/2023-05-03-arena/) | | | | | | | |
|
18 |
-
| [chinchilla-70b](https://arxiv.org/abs/2203.15556v1) | no | | | [0.808](https://arxiv.org/abs/2203.15556v1) | | [0.774](https://arxiv.org/abs/2203.15556v1) | | [0.675](https://arxiv.org/abs/2203.15556v1) | |
|
19 |
-
| [codex-12b / code-cushman-001](https://arxiv.org/abs/2107.03374) | yes | | | | [0.317](https://crfm.stanford.edu/helm/latest/?group=targeted_evaluations) | | | | |
|
20 |
-
| [code-davinci-002](https://arxiv.org/abs/2207.10397v2) | yes | | | | [0.658](https://arxiv.org/abs/2207.10397v2) | | | | |
|
21 |
-
| [codegen-16B-mono](https://huggingface.co/Salesforce/codegen-16B-mono) | yes | | | | [0.293](https://drive.google.com/file/d/1cN-b9GnWtHzQRoE7M7gAEyivY0kl4BYs/view) | | | | |
|
22 |
-
| [codegen-16B-multi](https://huggingface.co/Salesforce/codegen-16B-multi) | yes | | | | [0.183](https://drive.google.com/file/d/1cN-b9GnWtHzQRoE7M7gAEyivY0kl4BYs/view) | | | | |
|
23 |
-
| [codegx-13b](http://keg.cs.tsinghua.edu.cn/codegeex/) | no | | | | [0.229](https://drive.google.com/file/d/1cN-b9GnWtHzQRoE7M7gAEyivY0kl4BYs/view) | | | | |
|
24 |
-
| [dolly-v2-12b](https://huggingface.co/databricks/dolly-v2-12b) | yes | [944](https://lmsys.org/blog/2023-05-03-arena/) | | | | | | | |
|
25 |
-
| [eleuther-pythia-7b](https://huggingface.co/EleutherAI/pythia-6.9b) | yes | | | [0.667](https://www.mosaicml.com/blog/mpt-7b) | | [0.667](https://www.mosaicml.com/blog/mpt-7b) | [0.265](https://www.mosaicml.com/blog/mpt-7b) | | [0.198](https://www.mosaicml.com/blog/mpt-7b) |
|
26 |
-
| [eleuther-pythia-12b](https://huggingface.co/EleutherAI/pythia-12b) | yes | | | [0.704](https://www.mosaicml.com/blog/mpt-7b) | | [0.704](https://www.mosaicml.com/blog/mpt-7b) | [0.253](https://www.mosaicml.com/blog/mpt-7b) | | [0.233](https://www.mosaicml.com/blog/mpt-7b) |
|
27 |
-
| [fastchat-t5-3b](https://huggingface.co/lmsys/fastchat-t5-3b-v1.0) | yes | [951](https://lmsys.org/blog/2023-05-03-arena/) | | | | | | | |
|
28 |
-
| [gal-120b](https://arxiv.org/abs/2211.09085v1) | no | | | | | | [0.526](https://paperswithcode.com/paper/galactica-a-large-language-model-for-science-1) | | |
|
29 |
-
| [gpt-3-7b / curie](https://arxiv.org/abs/2005.14165) | yes | | [0.682](https://crfm.stanford.edu/helm/latest/?group=core_scenarios) | | | | | [0.243](https://crfm.stanford.edu/helm/latest/?group=core_scenarios) | |
|
30 |
-
| [gpt-3-175b / davinci](https://arxiv.org/abs/2005.14165) | yes | | [0.793](https://arxiv.org/abs/2005.14165) | [0.789](https://arxiv.org/abs/2005.14165) | | | | [0.439](https://arxiv.org/abs/2005.14165) | |
|
31 |
-
| [gpt-3.5-175b / text-davinci-003](https://arxiv.org/abs/2303.08774v3) | yes | | [0.822](https://crfm.stanford.edu/helm/latest/?group=core_scenarios) | | [0.481](https://arxiv.org/abs/2303.08774v3) | [0.762](https://arxiv.org/abs/2303.08774v3) | | [0.569](https://crfm.stanford.edu/helm/latest/?group=core_scenarios) | |
|
32 |
-
| [gpt-3.5-175b / code-davinci-002](https://platform.openai.com/docs/model-index-for-researchers) | yes | | | | [0.463](https://crfm.stanford.edu/helm/latest/?group=targeted_evaluations) | | | | |
|
33 |
-
| [gpt-4](https://arxiv.org/abs/2303.08774v3) | yes | | [0.953](https://arxiv.org/abs/2303.08774v3) | | [0.670](https://arxiv.org/abs/2303.08774v3) | | | [0.864](https://arxiv.org/abs/2303.08774v3) | |
|
34 |
-
| [gpt-neox-20b](https://huggingface.co/EleutherAI/gpt-neox-20b) | yes | | [0.718](https://crfm.stanford.edu/helm/latest/?group=core_scenarios) | [0.719](https://www.mosaicml.com/blog/mpt-7b) | | [0.719](https://www.mosaicml.com/blog/mpt-7b) | [0.269](https://www.mosaicml.com/blog/mpt-7b) | [0.276](https://crfm.stanford.edu/helm/latest/?group=core_scenarios) | [0.347](https://www.mosaicml.com/blog/mpt-7b) |
|
35 |
-
| [gpt-j-6b](https://huggingface.co/EleutherAI/gpt-j-6b) | yes | | [0.663](https://crfm.stanford.edu/helm/latest/?group=core_scenarios) | [0.683](https://www.mosaicml.com/blog/mpt-7b) | | [0.683](https://www.mosaicml.com/blog/mpt-7b) | [0.261](https://www.mosaicml.com/blog/mpt-7b) | [0.249](https://crfm.stanford.edu/helm/latest/?group=core_scenarios) | [0.234](https://www.mosaicml.com/blog/mpt-7b) |
|
36 |
-
| [koala-13b](https://bair.berkeley.edu/blog/2023/04/03/koala/) | no | [1082](https://lmsys.org/blog/2023-05-03-arena/) | | | | | | | |
|
37 |
-
| [llama-7b](https://arxiv.org/abs/2302.13971) | no | | | [0.738](https://www.mosaicml.com/blog/mpt-7b) | [0.105](https://drive.google.com/file/d/1cN-b9GnWtHzQRoE7M7gAEyivY0kl4BYs/view) | [0.738](https://www.mosaicml.com/blog/mpt-7b) | [0.302](https://www.mosaicml.com/blog/mpt-7b) | | [0.443](https://www.mosaicml.com/blog/mpt-7b) |
|
38 |
-
| [llama-13b](https://arxiv.org/abs/2302.13971) | no | [932](https://lmsys.org/blog/2023-05-03-arena/) | | [0.792](https://arxiv.org/abs/2302.13971) | [0.158](https://drive.google.com/file/d/1cN-b9GnWtHzQRoE7M7gAEyivY0kl4BYs/view) | | | | |
|
39 |
-
| [llama-33b](https://arxiv.org/abs/2302.13971) | no | | | [0.828](https://arxiv.org/abs/2302.13971) | [0.217](https://drive.google.com/file/d/1cN-b9GnWtHzQRoE7M7gAEyivY0kl4BYs/view) | | | | |
|
40 |
-
| [llama-65b](https://arxiv.org/abs/2302.13971) | no | | | [0.842](https://arxiv.org/abs/2302.13971) | [0.237](https://drive.google.com/file/d/1cN-b9GnWtHzQRoE7M7gAEyivY0kl4BYs/view) | | | [0.634](https://arxiv.org/abs/2302.13971v1) | |
|
41 |
-
| [mpt-7b](https://huggingface.co/mosaicml/mpt-7b) | yes | | | [0.761](https://www.mosaicml.com/blog/mpt-7b) | | [0.702](https://www.mosaicml.com/blog/mpt-7b) | [0.296](https://www.mosaicml.com/blog/mpt-7b) | | [0.343](https://www.mosaicml.com/blog/mpt-7b) |
|
42 |
-
| [oasst-pythia-12b](https://huggingface.co/OpenAssistant/pythia-12b-pre-v8-12.5k-steps) | yes | [1065](https://lmsys.org/blog/2023-05-03-arena/) | | | | | | | |
|
43 |
-
| [opt-7b](https://huggingface.co/facebook/opt-6.7b) | no | | | [0.677](https://www.mosaicml.com/blog/mpt-7b) | | [0.677](https://www.mosaicml.com/blog/mpt-7b) | [0.251](https://www.mosaicml.com/blog/mpt-7b) | | [0.227](https://www.mosaicml.com/blog/mpt-7b) |
|
44 |
-
| [opt-13b](https://huggingface.co/facebook/opt-13b) | no | | | [0.692](https://www.mosaicml.com/blog/mpt-7b) | | [0.692](https://www.mosaicml.com/blog/mpt-7b) | [0.257](https://www.mosaicml.com/blog/mpt-7b) | | [0.282](https://www.mosaicml.com/blog/mpt-7b) |
|
45 |
-
| [opt-66b](https://huggingface.co/facebook/opt-66b) | no | | [0.745](https://crfm.stanford.edu/helm/latest/?group=core_scenarios) | | | | | [0.276](https://crfm.stanford.edu/helm/latest/?group=core_scenarios) | |
|
46 |
-
| [opt-175b](https://ai.facebook.com/blog/democratizing-access-to-large-scale-language-models-with-opt-175b/) | no | | [0.791](https://crfm.stanford.edu/helm/latest/?group=core_scenarios) | | | | | [0.318](https://crfm.stanford.edu/helm/latest/?group=core_scenarios) | |
|
47 |
-
| [palm-540b](https://arxiv.org/abs/2204.02311v5) | no | | [0.838](https://arxiv.org/abs/2204.02311v5) | [0.834](https://arxiv.org/abs/2204.02311v5) | [0.262](https://drive.google.com/file/d/1cN-b9GnWtHzQRoE7M7gAEyivY0kl4BYs/view) | [0.779](https://arxiv.org/abs/2204.02311v5) | | [0.693](https://arxiv.org/abs/2204.02311v5) | |
|
48 |
-
| [replit-code-v1-3b](https://huggingface.co/replit/replit-code-v1-3b) | yes | | | | [0.219](https://twitter.com/amasad/status/1651019556423598081/photo/2) | | | | |
|
49 |
-
| [stablelm-base-alpha-7b](https://huggingface.co/stabilityai/stablelm-base-alpha-7b) | yes | | | [0.533](https://www.mosaicml.com/blog/mpt-7b) | | [0.533](https://www.mosaicml.com/blog/mpt-7b) | [0.251](https://www.mosaicml.com/blog/mpt-7b) | | [0.049](https://www.mosaicml.com/blog/mpt-7b) |
|
50 |
-
| [stablelm-tuned-alpha-7b](https://huggingface.co/stabilityai/stablelm-tuned-alpha-7b) | no | [858](https://lmsys.org/blog/2023-05-03-arena/) | | | | | | | |
|
51 |
-
| [starcoder-base-16b](https://huggingface.co/bigcode/starcoderbase) | yes | | | | [0.304](https://drive.google.com/file/d/1cN-b9GnWtHzQRoE7M7gAEyivY0kl4BYs/view) | | | | |
|
52 |
-
| [starcoder-16b](https://huggingface.co/bigcode/starcoder) | yes | | | | [0.336](https://drive.google.com/file/d/1cN-b9GnWtHzQRoE7M7gAEyivY0kl4BYs/view) | | | | |
|
53 |
-
| [starcoder-16b (prompted)](https://huggingface.co/bigcode/starcoder) | yes | | | | [0.408](https://drive.google.com/file/d/1cN-b9GnWtHzQRoE7M7gAEyivY0kl4BYs/view) | | | | |
|
54 |
-
| [vicuna-13b](https://huggingface.co/lmsys/vicuna-13b-delta-v0) | no | [1169](https://lmsys.org/blog/2023-05-03-arena/) | | | | | | | |
|
55 |
|
56 |
## Benchmarks
|
57 |
|
|
|
8 |
|
9 |
## Leaderboard
|
10 |
|
11 |
+
| Model Name | Publisher | Commercial Use? | Chatbot Arena Elo | HellaSwag (few-shot) | HellaSwag (zero-shot) | HumanEval-Python (pass@1) | LAMBADA (zero-shot) | MMLU (zero-shot) | MMLU (few-shot) | TriviaQA (zero-shot) |
|
12 |
+
| ----------------------------------------------------------------------------------------------------------- | ------------------- | --------------- | ------------------------------------------------ | -------------------------------------------------------------------- | --------------------------------------------- | ------------------------------------------------------------------------------- | --------------------------------------------- | ---------------------------------------------------------------------------------------- | -------------------------------------------------------------------- | --------------------------------------------- |
|
13 |
+
| [alpaca-13b](https://crfm.stanford.edu/2023/03/13/alpaca.html) | Stanford | no | [1008](https://lmsys.org/blog/2023-05-03-arena/) | | | | | | | |
|
14 |
+
| [bloom-176b](https://huggingface.co/bigscience/bloom) | BigScience | yes | | [0.744](https://crfm.stanford.edu/helm/latest/?group=core_scenarios) | | [0.155](https://huggingface.co/bigscience/bloom#results) | | [0.299](https://crfm.stanford.edu/helm/latest/?group=core_scenarios) | | |
|
15 |
+
| [cerebras-gpt-7b](https://huggingface.co/cerebras/Cerebras-GPT-6.7B) | Cerebras | yes | | | [0.636](https://www.mosaicml.com/blog/mpt-7b) | | [0.636](https://www.mosaicml.com/blog/mpt-7b) | [0.259](https://www.mosaicml.com/blog/mpt-7b) | | [0.141](https://www.mosaicml.com/blog/mpt-7b) |
|
16 |
+
| [cerebras-gpt-13b](https://huggingface.co/cerebras/Cerebras-GPT-13B) | Cerebras | yes | | | [0.635](https://www.mosaicml.com/blog/mpt-7b) | | [0.635](https://www.mosaicml.com/blog/mpt-7b) | [0.258](https://www.mosaicml.com/blog/mpt-7b) | | [0.146](https://www.mosaicml.com/blog/mpt-7b) |
|
17 |
+
| [chatglm-6b](https://chatglm.cn/blog) | ChatGLM | yes | [985](https://lmsys.org/blog/2023-05-03-arena/) | | | | | | | |
|
18 |
+
| [chinchilla-70b](https://arxiv.org/abs/2203.15556v1) | DeepMind | no | | | [0.808](https://arxiv.org/abs/2203.15556v1) | | [0.774](https://arxiv.org/abs/2203.15556v1) | | [0.675](https://arxiv.org/abs/2203.15556v1) | |
|
19 |
+
| [codex-12b / code-cushman-001](https://arxiv.org/abs/2107.03374) | OpenAI | yes | | | | [0.317](https://crfm.stanford.edu/helm/latest/?group=targeted_evaluations) | | | | |
|
20 |
+
| [code-davinci-002](https://arxiv.org/abs/2207.10397v2) | OpenAI | yes | | | | [0.658](https://arxiv.org/abs/2207.10397v2) | | | | |
|
21 |
+
| [codegen-16B-mono](https://huggingface.co/Salesforce/codegen-16B-mono) | Salesforce | yes | | | | [0.293](https://drive.google.com/file/d/1cN-b9GnWtHzQRoE7M7gAEyivY0kl4BYs/view) | | | | |
|
22 |
+
| [codegen-16B-multi](https://huggingface.co/Salesforce/codegen-16B-multi) | Salesforce | yes | | | | [0.183](https://drive.google.com/file/d/1cN-b9GnWtHzQRoE7M7gAEyivY0kl4BYs/view) | | | | |
|
23 |
+
| [codegx-13b](http://keg.cs.tsinghua.edu.cn/codegeex/) | Tsinghua University | no | | | | [0.229](https://drive.google.com/file/d/1cN-b9GnWtHzQRoE7M7gAEyivY0kl4BYs/view) | | | | |
|
24 |
+
| [dolly-v2-12b](https://huggingface.co/databricks/dolly-v2-12b) | Databricks | yes | [944](https://lmsys.org/blog/2023-05-03-arena/) | | | | | | | |
|
25 |
+
| [eleuther-pythia-7b](https://huggingface.co/EleutherAI/pythia-6.9b) | EleutherAI | yes | | | [0.667](https://www.mosaicml.com/blog/mpt-7b) | | [0.667](https://www.mosaicml.com/blog/mpt-7b) | [0.265](https://www.mosaicml.com/blog/mpt-7b) | | [0.198](https://www.mosaicml.com/blog/mpt-7b) |
|
26 |
+
| [eleuther-pythia-12b](https://huggingface.co/EleutherAI/pythia-12b) | EleutherAI | yes | | | [0.704](https://www.mosaicml.com/blog/mpt-7b) | | [0.704](https://www.mosaicml.com/blog/mpt-7b) | [0.253](https://www.mosaicml.com/blog/mpt-7b) | | [0.233](https://www.mosaicml.com/blog/mpt-7b) |
|
27 |
+
| [fastchat-t5-3b](https://huggingface.co/lmsys/fastchat-t5-3b-v1.0) | lmsys.org | yes | [951](https://lmsys.org/blog/2023-05-03-arena/) | | | | | | | |
|
28 |
+
| [gal-120b](https://arxiv.org/abs/2211.09085v1) | lmsys.org | no | | | | | | [0.526](https://paperswithcode.com/paper/galactica-a-large-language-model-for-science-1) | | |
|
29 |
+
| [gpt-3-7b / curie](https://arxiv.org/abs/2005.14165) | OpenAI | yes | | [0.682](https://crfm.stanford.edu/helm/latest/?group=core_scenarios) | | | | | [0.243](https://crfm.stanford.edu/helm/latest/?group=core_scenarios) | |
|
30 |
+
| [gpt-3-175b / davinci](https://arxiv.org/abs/2005.14165) | OpenAI | yes | | [0.793](https://arxiv.org/abs/2005.14165) | [0.789](https://arxiv.org/abs/2005.14165) | | | | [0.439](https://arxiv.org/abs/2005.14165) | |
|
31 |
+
| [gpt-3.5-175b / text-davinci-003](https://arxiv.org/abs/2303.08774v3) | OpenAI | yes | | [0.822](https://crfm.stanford.edu/helm/latest/?group=core_scenarios) | | [0.481](https://arxiv.org/abs/2303.08774v3) | [0.762](https://arxiv.org/abs/2303.08774v3) | | [0.569](https://crfm.stanford.edu/helm/latest/?group=core_scenarios) | |
|
32 |
+
| [gpt-3.5-175b / code-davinci-002](https://platform.openai.com/docs/model-index-for-researchers) | OpenAI | yes | | | | [0.463](https://crfm.stanford.edu/helm/latest/?group=targeted_evaluations) | | | | |
|
33 |
+
| [gpt-4](https://arxiv.org/abs/2303.08774v3) | OpenAI | yes | | [0.953](https://arxiv.org/abs/2303.08774v3) | | [0.670](https://arxiv.org/abs/2303.08774v3) | | | [0.864](https://arxiv.org/abs/2303.08774v3) | |
|
34 |
+
| [gpt-neox-20b](https://huggingface.co/EleutherAI/gpt-neox-20b) | EleutherAI | yes | | [0.718](https://crfm.stanford.edu/helm/latest/?group=core_scenarios) | [0.719](https://www.mosaicml.com/blog/mpt-7b) | | [0.719](https://www.mosaicml.com/blog/mpt-7b) | [0.269](https://www.mosaicml.com/blog/mpt-7b) | [0.276](https://crfm.stanford.edu/helm/latest/?group=core_scenarios) | [0.347](https://www.mosaicml.com/blog/mpt-7b) |
|
35 |
+
| [gpt-j-6b](https://huggingface.co/EleutherAI/gpt-j-6b) | EleutherAI | yes | | [0.663](https://crfm.stanford.edu/helm/latest/?group=core_scenarios) | [0.683](https://www.mosaicml.com/blog/mpt-7b) | | [0.683](https://www.mosaicml.com/blog/mpt-7b) | [0.261](https://www.mosaicml.com/blog/mpt-7b) | [0.249](https://crfm.stanford.edu/helm/latest/?group=core_scenarios) | [0.234](https://www.mosaicml.com/blog/mpt-7b) |
|
36 |
+
| [koala-13b](https://bair.berkeley.edu/blog/2023/04/03/koala/) | Berkeley BAIR | no | [1082](https://lmsys.org/blog/2023-05-03-arena/) | | | | | | | |
|
37 |
+
| [llama-7b](https://arxiv.org/abs/2302.13971) | Meta AI | no | | | [0.738](https://www.mosaicml.com/blog/mpt-7b) | [0.105](https://drive.google.com/file/d/1cN-b9GnWtHzQRoE7M7gAEyivY0kl4BYs/view) | [0.738](https://www.mosaicml.com/blog/mpt-7b) | [0.302](https://www.mosaicml.com/blog/mpt-7b) | | [0.443](https://www.mosaicml.com/blog/mpt-7b) |
|
38 |
+
| [llama-13b](https://arxiv.org/abs/2302.13971) | Meta AI | no | [932](https://lmsys.org/blog/2023-05-03-arena/) | | [0.792](https://arxiv.org/abs/2302.13971) | [0.158](https://drive.google.com/file/d/1cN-b9GnWtHzQRoE7M7gAEyivY0kl4BYs/view) | | | | |
|
39 |
+
| [llama-33b](https://arxiv.org/abs/2302.13971) | Meta AI | no | | | [0.828](https://arxiv.org/abs/2302.13971) | [0.217](https://drive.google.com/file/d/1cN-b9GnWtHzQRoE7M7gAEyivY0kl4BYs/view) | | | | |
|
40 |
+
| [llama-65b](https://arxiv.org/abs/2302.13971) | Meta AI | no | | | [0.842](https://arxiv.org/abs/2302.13971) | [0.237](https://drive.google.com/file/d/1cN-b9GnWtHzQRoE7M7gAEyivY0kl4BYs/view) | | | [0.634](https://arxiv.org/abs/2302.13971v1) | |
|
41 |
+
| [mpt-7b](https://huggingface.co/mosaicml/mpt-7b) | MosaicML | yes | | | [0.761](https://www.mosaicml.com/blog/mpt-7b) | | [0.702](https://www.mosaicml.com/blog/mpt-7b) | [0.296](https://www.mosaicml.com/blog/mpt-7b) | | [0.343](https://www.mosaicml.com/blog/mpt-7b) |
|
42 |
+
| [oasst-pythia-12b](https://huggingface.co/OpenAssistant/pythia-12b-pre-v8-12.5k-steps) | Open Assistant | yes | [1065](https://lmsys.org/blog/2023-05-03-arena/) | | | | | | | |
|
43 |
+
| [opt-7b](https://huggingface.co/facebook/opt-6.7b) | Meta AI | no | | | [0.677](https://www.mosaicml.com/blog/mpt-7b) | | [0.677](https://www.mosaicml.com/blog/mpt-7b) | [0.251](https://www.mosaicml.com/blog/mpt-7b) | | [0.227](https://www.mosaicml.com/blog/mpt-7b) |
|
44 |
+
| [opt-13b](https://huggingface.co/facebook/opt-13b) | Meta AI | no | | | [0.692](https://www.mosaicml.com/blog/mpt-7b) | | [0.692](https://www.mosaicml.com/blog/mpt-7b) | [0.257](https://www.mosaicml.com/blog/mpt-7b) | | [0.282](https://www.mosaicml.com/blog/mpt-7b) |
|
45 |
+
| [opt-66b](https://huggingface.co/facebook/opt-66b) | Meta AI | no | | [0.745](https://crfm.stanford.edu/helm/latest/?group=core_scenarios) | | | | | [0.276](https://crfm.stanford.edu/helm/latest/?group=core_scenarios) | |
|
46 |
+
| [opt-175b](https://ai.facebook.com/blog/democratizing-access-to-large-scale-language-models-with-opt-175b/) | Meta AI | no | | [0.791](https://crfm.stanford.edu/helm/latest/?group=core_scenarios) | | | | | [0.318](https://crfm.stanford.edu/helm/latest/?group=core_scenarios) | |
|
47 |
+
| [palm-540b](https://arxiv.org/abs/2204.02311v5) | Google Research | no | | [0.838](https://arxiv.org/abs/2204.02311v5) | [0.834](https://arxiv.org/abs/2204.02311v5) | [0.262](https://drive.google.com/file/d/1cN-b9GnWtHzQRoE7M7gAEyivY0kl4BYs/view) | [0.779](https://arxiv.org/abs/2204.02311v5) | | [0.693](https://arxiv.org/abs/2204.02311v5) | |
|
48 |
+
| [replit-code-v1-3b](https://huggingface.co/replit/replit-code-v1-3b) | Replit | yes | | | | [0.219](https://twitter.com/amasad/status/1651019556423598081/photo/2) | | | | |
|
49 |
+
| [stablelm-base-alpha-7b](https://huggingface.co/stabilityai/stablelm-base-alpha-7b) | Stability AI | yes | | | [0.533](https://www.mosaicml.com/blog/mpt-7b) | | [0.533](https://www.mosaicml.com/blog/mpt-7b) | [0.251](https://www.mosaicml.com/blog/mpt-7b) | | [0.049](https://www.mosaicml.com/blog/mpt-7b) |
|
50 |
+
| [stablelm-tuned-alpha-7b](https://huggingface.co/stabilityai/stablelm-tuned-alpha-7b) | Stability AI | no | [858](https://lmsys.org/blog/2023-05-03-arena/) | | | | | | | |
|
51 |
+
| [starcoder-base-16b](https://huggingface.co/bigcode/starcoderbase) | BigCode | yes | | | | [0.304](https://drive.google.com/file/d/1cN-b9GnWtHzQRoE7M7gAEyivY0kl4BYs/view) | | | | |
|
52 |
+
| [starcoder-16b](https://huggingface.co/bigcode/starcoder) | BigCode | yes | | | | [0.336](https://drive.google.com/file/d/1cN-b9GnWtHzQRoE7M7gAEyivY0kl4BYs/view) | | | | |
|
53 |
+
| [starcoder-16b (prompted)](https://huggingface.co/bigcode/starcoder) | BigCode | yes | | | | [0.408](https://drive.google.com/file/d/1cN-b9GnWtHzQRoE7M7gAEyivY0kl4BYs/view) | | | | |
|
54 |
+
| [vicuna-13b](https://huggingface.co/lmsys/vicuna-13b-delta-v0) | lmsys.org | no | [1169](https://lmsys.org/blog/2023-05-03-arena/) | | | | | | | |
|
55 |
|
56 |
## Benchmarks
|
57 |
|
streamlit_app.py
CHANGED
@@ -146,7 +146,7 @@ def setup_leaderboard(readme: str):
|
|
146 |
df_leaderboard["Commercial Use?"] = df_leaderboard["Commercial Use?"].map({"yes": 1, "no": 0}).astype(bool)
|
147 |
|
148 |
st.markdown("## Leaderboard")
|
149 |
-
st.dataframe(filter_dataframe(df_leaderboard, ignore_columns=["Commercial Use?"]))
|
150 |
|
151 |
|
152 |
def setup_benchmarks(readme: str):
|
|
|
146 |
df_leaderboard["Commercial Use?"] = df_leaderboard["Commercial Use?"].map({"yes": 1, "no": 0}).astype(bool)
|
147 |
|
148 |
st.markdown("## Leaderboard")
|
149 |
+
st.dataframe(filter_dataframe(df_leaderboard, ignore_columns=["Commercial Use?", "Publisher"]))
|
150 |
|
151 |
|
152 |
def setup_benchmarks(readme: str):
|