Spaces:

ludwigstumpp
/

llm-leaderboard

Running

App Files Files Community

Ludwig Stumpp commited on May 11, 2023

Commit

eedd6a6

1 Parent(s): 7e2df21

Add WinoGrande few shot results for gpt4 and 3.5

Browse files

Files changed (1) hide show

README.md +2 -2

README.md CHANGED Viewed

@@ -28,9 +28,9 @@ https://llm-leaderboard.streamlit.app/
 | [gal-120b](https://arxiv.org/abs/2211.09085v1)                                                              | Lmsys.org           | no              |                                                  |                                                                      |                                               |                                                                 |                                                                                 |                                               |                                                                 | [0.526](https://paperswithcode.com/paper/galactica-a-large-language-model-for-science-1) |                                                                      |                                               |                                                                 |                                                                 |                                                                 |
 | [gpt-3-7b / curie](https://arxiv.org/abs/2005.14165)                                                        | OpenAI              | yes             |                                                  | [0.682](https://crfm.stanford.edu/helm/latest/?group=core_scenarios) |                                               |                                                                 |                                                                                 |                                               |                                                                 |                                                                                          | [0.243](https://crfm.stanford.edu/helm/latest/?group=core_scenarios) |                                               |                                                                 |                                                                 |                                                                 |
 | [gpt-3-175b / davinci](https://arxiv.org/abs/2005.14165)                                                    | OpenAI              | yes             |                                                  | [0.793](https://arxiv.org/abs/2005.14165)                            | [0.789](https://arxiv.org/abs/2005.14165)     |                                                                 |                                                                                 |                                               |                                                                 |                                                                                          | [0.439](https://arxiv.org/abs/2005.14165)                            |                                               |                                                                 |                                                                 |                                                                 |
-| [gpt-3.5-175b / text-davinci-003](https://arxiv.org/abs/2303.08774v3)                                       | OpenAI              | yes             |                                                  | [0.822](https://crfm.stanford.edu/helm/latest/?group=core_scenarios) |                                               |                                                                 | [0.481](https://arxiv.org/abs/2303.08774v3)                                     | [0.762](https://arxiv.org/abs/2303.08774v3)   |                                                                 |                                                                                          | [0.569](https://crfm.stanford.edu/helm/latest/?group=core_scenarios) |                                               |                                                                 |                                                                 |                                                                 |
 | [gpt-3.5-175b / code-davinci-002](https://platform.openai.com/docs/model-index-for-researchers)             | OpenAI              | yes             |                                                  |                                                                      |                                               |                                                                 | [0.463](https://crfm.stanford.edu/helm/latest/?group=targeted_evaluations)      |                                               |                                                                 |                                                                                          |                                                                      |                                               |                                                                 |                                                                 |                                                                 |
-| [gpt-4](https://arxiv.org/abs/2303.08774v3)                                                                 | OpenAI              | yes             |                                                  | [0.953](https://arxiv.org/abs/2303.08774v3)                          |                                               |                                                                 | [0.670](https://arxiv.org/abs/2303.08774v3)                                     |                                               |                                                                 |                                                                                          | [0.864](https://arxiv.org/abs/2303.08774v3)                          |                                               |                                                                 |                                                                 |                                                                 |
 | [gpt-neox-20b](https://huggingface.co/EleutherAI/gpt-neox-20b)                                              | EleutherAI          | yes             |                                                  | [0.718](https://crfm.stanford.edu/helm/latest/?group=core_scenarios) | [0.719](https://www.mosaicml.com/blog/mpt-7b) |                                                                 |                                                                                 | [0.719](https://www.mosaicml.com/blog/mpt-7b) |                                                                 | [0.269](https://www.mosaicml.com/blog/mpt-7b)                                            | [0.276](https://crfm.stanford.edu/helm/latest/?group=core_scenarios) | [0.347](https://www.mosaicml.com/blog/mpt-7b) |                                                                 |                                                                 |                                                                 |
 | [gpt-j-6b](https://huggingface.co/EleutherAI/gpt-j-6b)                                                      | EleutherAI          | yes             |                                                  | [0.663](https://crfm.stanford.edu/helm/latest/?group=core_scenarios) | [0.683](https://www.mosaicml.com/blog/mpt-7b) |                                                                 |                                                                                 | [0.683](https://www.mosaicml.com/blog/mpt-7b) |                                                                 | [0.261](https://www.mosaicml.com/blog/mpt-7b)                                            | [0.249](https://crfm.stanford.edu/helm/latest/?group=core_scenarios) | [0.234](https://www.mosaicml.com/blog/mpt-7b) |                                                                 |                                                                 |                                                                 |
 | [koala-13b](https://bair.berkeley.edu/blog/2023/04/03/koala/)                                               | Berkeley BAIR       | no              | [1082](https://lmsys.org/blog/2023-05-03-arena/) |                                                                      |                                               |                                                                 |                                                                                 |                                               |                                                                 |                                                                                          |                                                                      |                                               |                                                                 |                                                                 |                                                                 |

 | [gal-120b](https://arxiv.org/abs/2211.09085v1)                                                              | Lmsys.org           | no              |                                                  |                                                                      |                                               |                                                                 |                                                                                 |                                               |                                                                 | [0.526](https://paperswithcode.com/paper/galactica-a-large-language-model-for-science-1) |                                                                      |                                               |                                                                 |                                                                 |                                                                 |
 | [gpt-3-7b / curie](https://arxiv.org/abs/2005.14165)                                                        | OpenAI              | yes             |                                                  | [0.682](https://crfm.stanford.edu/helm/latest/?group=core_scenarios) |                                               |                                                                 |                                                                                 |                                               |                                                                 |                                                                                          | [0.243](https://crfm.stanford.edu/helm/latest/?group=core_scenarios) |                                               |                                                                 |                                                                 |                                                                 |
 | [gpt-3-175b / davinci](https://arxiv.org/abs/2005.14165)                                                    | OpenAI              | yes             |                                                  | [0.793](https://arxiv.org/abs/2005.14165)                            | [0.789](https://arxiv.org/abs/2005.14165)     |                                                                 |                                                                                 |                                               |                                                                 |                                                                                          | [0.439](https://arxiv.org/abs/2005.14165)                            |                                               |                                                                 |                                                                 |                                                                 |
+| [gpt-3.5-175b / text-davinci-003](https://arxiv.org/abs/2303.08774v3)                                       | OpenAI              | yes             |                                                  | [0.822](https://crfm.stanford.edu/helm/latest/?group=core_scenarios) |                                               |                                                                 | [0.481](https://arxiv.org/abs/2303.08774v3)                                     | [0.762](https://arxiv.org/abs/2303.08774v3)   |                                                                 |                                                                                          | [0.569](https://crfm.stanford.edu/helm/latest/?group=core_scenarios) |                                               |                                                                 |                                                                 | [0.816](https://arxiv.org/abs/2303.08774v3)                     |
 | [gpt-3.5-175b / code-davinci-002](https://platform.openai.com/docs/model-index-for-researchers)             | OpenAI              | yes             |                                                  |                                                                      |                                               |                                                                 | [0.463](https://crfm.stanford.edu/helm/latest/?group=targeted_evaluations)      |                                               |                                                                 |                                                                                          |                                                                      |                                               |                                                                 |                                                                 |                                                                 |
+| [gpt-4](https://arxiv.org/abs/2303.08774v3)                                                                 | OpenAI              | yes             |                                                  | [0.953](https://arxiv.org/abs/2303.08774v3)                          |                                               |                                                                 | [0.670](https://arxiv.org/abs/2303.08774v3)                                     |                                               |                                                                 |                                                                                          | [0.864](https://arxiv.org/abs/2303.08774v3)                          |                                               |                                                                 |                                                                 | [0.875](https://arxiv.org/abs/2303.08774v3)                     |
 | [gpt-neox-20b](https://huggingface.co/EleutherAI/gpt-neox-20b)                                              | EleutherAI          | yes             |                                                  | [0.718](https://crfm.stanford.edu/helm/latest/?group=core_scenarios) | [0.719](https://www.mosaicml.com/blog/mpt-7b) |                                                                 |                                                                                 | [0.719](https://www.mosaicml.com/blog/mpt-7b) |                                                                 | [0.269](https://www.mosaicml.com/blog/mpt-7b)                                            | [0.276](https://crfm.stanford.edu/helm/latest/?group=core_scenarios) | [0.347](https://www.mosaicml.com/blog/mpt-7b) |                                                                 |                                                                 |                                                                 |
 | [gpt-j-6b](https://huggingface.co/EleutherAI/gpt-j-6b)                                                      | EleutherAI          | yes             |                                                  | [0.663](https://crfm.stanford.edu/helm/latest/?group=core_scenarios) | [0.683](https://www.mosaicml.com/blog/mpt-7b) |                                                                 |                                                                                 | [0.683](https://www.mosaicml.com/blog/mpt-7b) |                                                                 | [0.261](https://www.mosaicml.com/blog/mpt-7b)                                            | [0.249](https://crfm.stanford.edu/helm/latest/?group=core_scenarios) | [0.234](https://www.mosaicml.com/blog/mpt-7b) |                                                                 |                                                                 |                                                                 |
 | [koala-13b](https://bair.berkeley.edu/blog/2023/04/03/koala/)                                               | Berkeley BAIR       | no              | [1082](https://lmsys.org/blog/2023-05-03-arena/) |                                                                      |                                               |                                                                 |                                                                                 |                                               |                                                                 |                                                                                          |                                                                      |                                               |                                                                 |                                                                 |                                                                 |