Spaces:
Running
Running
Ludwig Stumpp
commited on
Commit
·
eedd6a6
1
Parent(s):
7e2df21
Add WinoGrande few shot results for gpt4 and 3.5
Browse files
README.md
CHANGED
@@ -28,9 +28,9 @@ https://llm-leaderboard.streamlit.app/
|
|
28 |
| [gal-120b](https://arxiv.org/abs/2211.09085v1) | Lmsys.org | no | | | | | | | | [0.526](https://paperswithcode.com/paper/galactica-a-large-language-model-for-science-1) | | | | | |
|
29 |
| [gpt-3-7b / curie](https://arxiv.org/abs/2005.14165) | OpenAI | yes | | [0.682](https://crfm.stanford.edu/helm/latest/?group=core_scenarios) | | | | | | | [0.243](https://crfm.stanford.edu/helm/latest/?group=core_scenarios) | | | | |
|
30 |
| [gpt-3-175b / davinci](https://arxiv.org/abs/2005.14165) | OpenAI | yes | | [0.793](https://arxiv.org/abs/2005.14165) | [0.789](https://arxiv.org/abs/2005.14165) | | | | | | [0.439](https://arxiv.org/abs/2005.14165) | | | | |
|
31 |
-
| [gpt-3.5-175b / text-davinci-003](https://arxiv.org/abs/2303.08774v3) | OpenAI | yes | | [0.822](https://crfm.stanford.edu/helm/latest/?group=core_scenarios) | | | [0.481](https://arxiv.org/abs/2303.08774v3) | [0.762](https://arxiv.org/abs/2303.08774v3) | | | [0.569](https://crfm.stanford.edu/helm/latest/?group=core_scenarios) | | | |
|
32 |
| [gpt-3.5-175b / code-davinci-002](https://platform.openai.com/docs/model-index-for-researchers) | OpenAI | yes | | | | | [0.463](https://crfm.stanford.edu/helm/latest/?group=targeted_evaluations) | | | | | | | | |
|
33 |
-
| [gpt-4](https://arxiv.org/abs/2303.08774v3) | OpenAI | yes | | [0.953](https://arxiv.org/abs/2303.08774v3) | | | [0.670](https://arxiv.org/abs/2303.08774v3) | | | | [0.864](https://arxiv.org/abs/2303.08774v3) | | | |
|
34 |
| [gpt-neox-20b](https://huggingface.co/EleutherAI/gpt-neox-20b) | EleutherAI | yes | | [0.718](https://crfm.stanford.edu/helm/latest/?group=core_scenarios) | [0.719](https://www.mosaicml.com/blog/mpt-7b) | | | [0.719](https://www.mosaicml.com/blog/mpt-7b) | | [0.269](https://www.mosaicml.com/blog/mpt-7b) | [0.276](https://crfm.stanford.edu/helm/latest/?group=core_scenarios) | [0.347](https://www.mosaicml.com/blog/mpt-7b) | | | |
|
35 |
| [gpt-j-6b](https://huggingface.co/EleutherAI/gpt-j-6b) | EleutherAI | yes | | [0.663](https://crfm.stanford.edu/helm/latest/?group=core_scenarios) | [0.683](https://www.mosaicml.com/blog/mpt-7b) | | | [0.683](https://www.mosaicml.com/blog/mpt-7b) | | [0.261](https://www.mosaicml.com/blog/mpt-7b) | [0.249](https://crfm.stanford.edu/helm/latest/?group=core_scenarios) | [0.234](https://www.mosaicml.com/blog/mpt-7b) | | | |
|
36 |
| [koala-13b](https://bair.berkeley.edu/blog/2023/04/03/koala/) | Berkeley BAIR | no | [1082](https://lmsys.org/blog/2023-05-03-arena/) | | | | | | | | | | | | |
|
|
|
28 |
| [gal-120b](https://arxiv.org/abs/2211.09085v1) | Lmsys.org | no | | | | | | | | [0.526](https://paperswithcode.com/paper/galactica-a-large-language-model-for-science-1) | | | | | |
|
29 |
| [gpt-3-7b / curie](https://arxiv.org/abs/2005.14165) | OpenAI | yes | | [0.682](https://crfm.stanford.edu/helm/latest/?group=core_scenarios) | | | | | | | [0.243](https://crfm.stanford.edu/helm/latest/?group=core_scenarios) | | | | |
|
30 |
| [gpt-3-175b / davinci](https://arxiv.org/abs/2005.14165) | OpenAI | yes | | [0.793](https://arxiv.org/abs/2005.14165) | [0.789](https://arxiv.org/abs/2005.14165) | | | | | | [0.439](https://arxiv.org/abs/2005.14165) | | | | |
|
31 |
+
| [gpt-3.5-175b / text-davinci-003](https://arxiv.org/abs/2303.08774v3) | OpenAI | yes | | [0.822](https://crfm.stanford.edu/helm/latest/?group=core_scenarios) | | | [0.481](https://arxiv.org/abs/2303.08774v3) | [0.762](https://arxiv.org/abs/2303.08774v3) | | | [0.569](https://crfm.stanford.edu/helm/latest/?group=core_scenarios) | | | | [0.816](https://arxiv.org/abs/2303.08774v3) |
|
32 |
| [gpt-3.5-175b / code-davinci-002](https://platform.openai.com/docs/model-index-for-researchers) | OpenAI | yes | | | | | [0.463](https://crfm.stanford.edu/helm/latest/?group=targeted_evaluations) | | | | | | | | |
|
33 |
+
| [gpt-4](https://arxiv.org/abs/2303.08774v3) | OpenAI | yes | | [0.953](https://arxiv.org/abs/2303.08774v3) | | | [0.670](https://arxiv.org/abs/2303.08774v3) | | | | [0.864](https://arxiv.org/abs/2303.08774v3) | | | | [0.875](https://arxiv.org/abs/2303.08774v3) |
|
34 |
| [gpt-neox-20b](https://huggingface.co/EleutherAI/gpt-neox-20b) | EleutherAI | yes | | [0.718](https://crfm.stanford.edu/helm/latest/?group=core_scenarios) | [0.719](https://www.mosaicml.com/blog/mpt-7b) | | | [0.719](https://www.mosaicml.com/blog/mpt-7b) | | [0.269](https://www.mosaicml.com/blog/mpt-7b) | [0.276](https://crfm.stanford.edu/helm/latest/?group=core_scenarios) | [0.347](https://www.mosaicml.com/blog/mpt-7b) | | | |
|
35 |
| [gpt-j-6b](https://huggingface.co/EleutherAI/gpt-j-6b) | EleutherAI | yes | | [0.663](https://crfm.stanford.edu/helm/latest/?group=core_scenarios) | [0.683](https://www.mosaicml.com/blog/mpt-7b) | | | [0.683](https://www.mosaicml.com/blog/mpt-7b) | | [0.261](https://www.mosaicml.com/blog/mpt-7b) | [0.249](https://crfm.stanford.edu/helm/latest/?group=core_scenarios) | [0.234](https://www.mosaicml.com/blog/mpt-7b) | | | |
|
36 |
| [koala-13b](https://bair.berkeley.edu/blog/2023/04/03/koala/) | Berkeley BAIR | no | [1082](https://lmsys.org/blog/2023-05-03-arena/) | | | | | | | | | | | | |
|