tokyotech-llm
/

Swallow-MS-7b-instruct-v0.1

Text Generation

text-generation-inference

Model card Files Files and versions Community

stjohn2007 commited on May 1, 2024

Commit

33413d0

·

verified ·

1 Parent(s): 0a9104c

Update README.md

Files changed (1) hide show

README.md +19 -5

README.md CHANGED Viewed

@@ -32,27 +32,41 @@ This repository provides large language models developed by [TokyoTech-LLM](http
 ### MT-Bench JA
-* We report overall (i.e., average over scores of the first and second turns), first, and second turn scores.
-* We will add the scores of existing models soon.
-#### Overall
 |Model|Average|Writing|Roleplay|Reasoning|Math|Coding|Extraction|STEM|Humanities|
 |---|---|---|---|---|---|---|---|---|---|
 | Swallow-MS-7b-instruct-v0.1 |0.3411|0.3770|0.4290|0.3454|0.1040|0.2400|0.3677|0.3907|0.4750|
-#### First Turn
 |Model|Average|Writing|Roleplay|Reasoning|Math|Coding|Extraction|STEM|Humanities|
 |---|---|---|---|---|---|---|---|---|---|
 | Swallow-MS-7b-instruct-v0.1 |0.3699|0.4880|0.4260|0.3900|0.1080|0.2364|0.3780|0.4500|0.4800|
-#### Second Turn
 |Model|Average|Writing|Roleplay|Reasoning|Math|Coding|Extraction|STEM|Humanities|
 |---|---|---|---|---|---|---|---|---|---|
 | Swallow-MS-7b-instruct-v0.1 |0.3130|0.2624|0.4320|0.2996|0.1000|0.2430|0.3564|0.3291|0.4700|
 ## Evaluation Benchmarks

 ### MT-Bench JA
+#### Turn-Wise Performance
+We report overall (i.e., average over scores of the first and second turns), first, and second turn scores.
+##### Overall
 |Model|Average|Writing|Roleplay|Reasoning|Math|Coding|Extraction|STEM|Humanities|
 |---|---|---|---|---|---|---|---|---|---|
 | Swallow-MS-7b-instruct-v0.1 |0.3411|0.3770|0.4290|0.3454|0.1040|0.2400|0.3677|0.3907|0.4750|
+##### First Turn
 |Model|Average|Writing|Roleplay|Reasoning|Math|Coding|Extraction|STEM|Humanities|
 |---|---|---|---|---|---|---|---|---|---|
 | Swallow-MS-7b-instruct-v0.1 |0.3699|0.4880|0.4260|0.3900|0.1080|0.2364|0.3780|0.4500|0.4800|
+##### Second Turn
 |Model|Average|Writing|Roleplay|Reasoning|Math|Coding|Extraction|STEM|Humanities|
 |---|---|---|---|---|---|---|---|---|---|
 | Swallow-MS-7b-instruct-v0.1 |0.3130|0.2624|0.4320|0.2996|0.1000|0.2430|0.3564|0.3291|0.4700|
+#### Comparison to the past model
+We only provide the overall score in this section.
+|Model|Average|Writing|Roleplay|Reasoning|Math|Coding|Extraction|STEM|Humanities|
+|---|---|---|---|---|---|---|---|---|---|
+| Swallow-MS-7b-instruct-v0.1 |0.3411|0.3770|0.4290|0.3454|0.1040|0.2400|0.3677|0.3907|0.4750|
+| ELYZA-japanese-Llama-2-7b-fast-instruct |0.2827|0.3289|0.3907|0.2424|0.1480|0.1584|0.3511|0.3053|0.3365|
+| calm2-7b-chat |0.3204|0.4657|0.4898|0.1837|0.1005|0.1414|0.3927|0.3601|0.4293|
+| calm2-7b-chat-dpo-experimental |0.3493|0.5312|0.5237|0.1857|0.1000|0.1813|0.3355|0.4320|0.5051|
+| RakutenAI-7B-instruct |0.2994|0.3623|0.3711|0.3333|0.1763|0.1581|0.4215|0.2824|0.2901|
+| RakutenAI-7B-chat |0.3667|0.4229|0.4644|0.3990|0.2161|0.2390|0.3416|0.3904|0.4601|
 ## Evaluation Benchmarks