Update README.md
Browse filesAdd detailed turn scores
README.md
CHANGED
@@ -32,12 +32,27 @@ This repository provides large language models developed by [TokyoTech-LLM](http
|
|
32 |
|
33 |
### MT-Bench JA
|
34 |
|
|
|
35 |
* We will add the scores of existing models soon.
|
36 |
|
|
|
|
|
37 |
|Model|Average|Writing|Roleplay|Reasoning|Math|Coding|Extraction|STEM|Humanities|
|
38 |
|---|---|---|---|---|---|---|---|---|---|
|
39 |
| Swallow-MS-7b-instruct-v0.1 |0.3411|0.3770|0.4290|0.3454|0.1040|0.2400|0.3677|0.3907|0.4750|
|
40 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
41 |
|
42 |
## Evaluation Benchmarks
|
43 |
|
|
|
32 |
|
33 |
### MT-Bench JA
|
34 |
|
35 |
+
* We report overall (i.e., average over scores of the first and second turns), first, and second turn scores.
|
36 |
* We will add the scores of existing models soon.
|
37 |
|
38 |
+
#### Overall
|
39 |
+
|
40 |
|Model|Average|Writing|Roleplay|Reasoning|Math|Coding|Extraction|STEM|Humanities|
|
41 |
|---|---|---|---|---|---|---|---|---|---|
|
42 |
| Swallow-MS-7b-instruct-v0.1 |0.3411|0.3770|0.4290|0.3454|0.1040|0.2400|0.3677|0.3907|0.4750|
|
43 |
|
44 |
+
#### First Turn
|
45 |
+
|
46 |
+
|Model|Average|Writing|Roleplay|Reasoning|Math|Coding|Extraction|STEM|Humanities|
|
47 |
+
|---|---|---|---|---|---|---|---|---|---|
|
48 |
+
| Swallow-MS-7b-instruct-v0.1 |0.3699|0.4880|0.4260|0.3900|0.1080|0.2364|0.3780|0.4500|0.4800|
|
49 |
+
|
50 |
+
#### Second Turn
|
51 |
+
|
52 |
+
|Model|Average|Writing|Roleplay|Reasoning|Math|Coding|Extraction|STEM|Humanities|
|
53 |
+
|---|---|---|---|---|---|---|---|---|---|
|
54 |
+
| Swallow-MS-7b-instruct-v0.1 |0.3130|0.2624|0.4320|0.2996|0.1000|0.2430|0.3564|0.3291|0.4700|
|
55 |
+
|
56 |
|
57 |
## Evaluation Benchmarks
|
58 |
|