kotoba-tech
/

kotoba-whisper-v2.0

@@ -85,28 +85,36 @@ Kotoba-whisper-v2.0 achieves better CER and WER than the [openai/whisper-large-v
 from ReazonSpeech, and achieves competitive CER and WER on the out-of-domain test sets including [JSUT basic 5000](https://sites.google.com/site/shinnosuketakamichi/publication/jsut) and
 the Japanese subset from [CommonVoice 8.0](https://huggingface.co/datasets/common_voice) (see [Evaluation](#evaluation) for detail).
 - ***CER***
-| Model                                                                                        |   [CommonVoice 8.0](https://huggingface.co/datasets/japanese-asr/ja_asr.common_voice_8_0) |   [JSUT basic5000](https://huggingface.co/datasets/japanese-asr/ja_asr.jsut_basic5000) |   [ReazonSpeech Test](https://huggingface.co/datasets/japanese-asr/ja_asr.reazonspeech_test) |
-|:---------------------------------------------------------------------------------------------|-------------------:|-----------------:|--------------------:|
-| [**kotoba-tech/kotoba-whisper-v2.0**](https://huggingface.co/kotoba-tech/kotoba-whisper-v2.0)|               9.20 |             8.40 |           **11.63** |
-| [kotoba-tech/kotoba-whisper-v1.0](https://huggingface.co/kotoba-tech/kotoba-whisper-v1.0)    |               9.44 |             8.48 |               12.60 |
-| [openai/whisper-large-v3](https://huggingface.co/openai/whisper-large-v3)                    |           **8.52** |         **7.18** |               15.18 |
-| [openai/whisper-medium](https://huggingface.co/openai/whisper-medium)                        |              11.34 |             9.87 |               29.56 |
-| [openai/whisper-small](https://huggingface.co/openai/whisper-small)                          |              15.26 |            14.22 |               34.29 |
-| [openai/whisper-tiny](https://huggingface.co/openai/whisper-tiny)                            |              46.86 |            35.69 |               96.69 |
 - ***WER***
-| Model                                                                                           |   [CommonVoice 8.0](https://huggingface.co/datasets/japanese-asr/ja_asr.common_voice_8_0) |   [JSUT basic5000](https://huggingface.co/datasets/japanese-asr/ja_asr.jsut_basic5000) |   [ReazonSpeech Test](https://huggingface.co/datasets/japanese-asr/ja_asr.reazonspeech_test) |
-|:------------------------------------------------------------------------------------------------|---------------------------:|----------------:|------------------:|
-| [**kotoba-tech/kotoba-whisper-v2.0**](https://huggingface.co/kotoba-tech/kotoba-whisper-v2.0)   |                       58.8 |            63.7 |          **55.6** |
-| [kotoba-tech/kotoba-whisper-v1.0](https://huggingface.co/kotoba-tech/kotoba-whisper-v1.0)       |                      59.27 |           64.36 |             56.62 |
-| [openai/whisper-large-v3](https://huggingface.co/openai/whisper-large-v3)                       |                  **55.41** |       **59.34** |             60.23 |
-| [openai/whisper-medium](https://huggingface.co/openai/whisper-medium)                           |                      63.64 |           69.52 |             76.04 |
-| [openai/whisper-small](https://huggingface.co/openai/whisper-small)                             |                      74.21 |           82.02 |             82.99 |
-| [openai/whisper-tiny](https://huggingface.co/openai/whisper-tiny)                               |                      93.78 |           97.72 |             94.85 |
 - ***Latency***: As kotoba-whisper uses the same architecture as [distil-whisper/distil-large-v3](https://huggingface.co/distil-whisper/distil-large-v3),
 it inherits the benefit of the improved latency compared to [openai/whisper-large-v3](https://huggingface.co/openai/whisper-large-v3)

 from ReazonSpeech, and achieves competitive CER and WER on the out-of-domain test sets including [JSUT basic 5000](https://sites.google.com/site/shinnosuketakamichi/publication/jsut) and
 the Japanese subset from [CommonVoice 8.0](https://huggingface.co/datasets/common_voice) (see [Evaluation](#evaluation) for detail).
 - ***CER***
+| model                                                                                                                                             |   [CommonVoice 8 (Japanese test set)](https://huggingface.co/datasets/japanese-asr/ja_asr.common_voice_8_0) |   [JSUT Basic 5000](https://huggingface.co/datasets/japanese-asr/ja_asr.jsut_basic5000) |   [ReazonSpeech (held out test set)](https://huggingface.co/datasets/japanese-asr/ja_asr.reazonspeech_test) |
+|:--------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------:|----------------------------------------------------------------------------------------:|------------------------------------------------------------------------------------------------------------:|
+| [kotoba-tech/kotoba-whisper-v2.0](https://huggingface.co/kotoba-tech/kotoba-whisper-v2.0)                                                         |                                                                                                         9.2 |                                                                                     8.4 |                                                                                                        11.6 |
+| [kotoba-tech/kotoba-whisper-v1.0](https://huggingface.co/kotoba-tech/kotoba-whisper-v1.0)                                                         |                                                                                                         9.4 |                                                                                     8.5 |                                                                                                        12.2 |
+| [openai/whisper-large-v3](https://huggingface.co/openai/whisper-large-v3)                                                                         |                                                                                                         8.5 |                                                                                     7.1 |                                                                                                        14.9 |
+| [openai/whisper-large-v2](https://huggingface.co/openai/whisper-large-v2)                                                                         |                                                                                                         9.7 |                                                                                     8.2 |                                                                                                        28.1 |
+| [openai/whisper-large](https://huggingface.co/openai/whisper-large)                                                                               |                                                                                                        10   |                                                                                     8.9 |                                                                                                        34.1 |
+| [openai/whisper-medium](https://huggingface.co/openai/whisper-medium)                                                                             |                                                                                                        11.5 |                                                                                    10   |                                                                                                        33.2 |
+| [openai/whisper-base](https://huggingface.co/openai/whisper-base)                                                                                 |                                                                                                        28.6 |                                                                                    24.9 |                                                                                                        70.4 |
+| [openai/whisper-small](https://huggingface.co/openai/whisper-small)                                                                               |                                                                                                        15.1 |                                                                                    14.2 |                                                                                                        41.5 |
+| [openai/whisper-tiny](https://huggingface.co/openai/whisper-tiny)                                                                                 |                                                                                                        53.7 |                                                                                    36.5 |                                                                                                       137.9 |
 - ***WER***
+| model                                                                                                                                             |   [CommonVoice 8 (Japanese test set)](https://huggingface.co/datasets/japanese-asr/ja_asr.common_voice_8_0) |   [JSUT Basic 5000](https://huggingface.co/datasets/japanese-asr/ja_asr.jsut_basic5000) |   [ReazonSpeech (held out test set)](https://huggingface.co/datasets/japanese-asr/ja_asr.reazonspeech_test) |
+|:--------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------:|----------------------------------------------------------------------------------------:|------------------------------------------------------------------------------------------------------------:|
+| [kotoba-tech/kotoba-whisper-v2.0](https://huggingface.co/kotoba-tech/kotoba-whisper-v2.0)                                                         |                                                                                                        58.8 |                                                                                    63.7 |                                                                                                        55.6 |
+| [kotoba-tech/kotoba-whisper-v1.0](https://huggingface.co/kotoba-tech/kotoba-whisper-v1.0)                                                         |                                                                                                        59.2 |                                                                                    64.3 |                                                                                                        56.4 |
+| [openai/whisper-large-v3](https://huggingface.co/openai/whisper-large-v3)                                                                         |                                                                                                        55.1 |                                                                                    59.2 |                                                                                                        60.2 |
+| [openai/whisper-large-v2](https://huggingface.co/openai/whisper-large-v2)                                                                         |                                                                                                        59.3 |                                                                                    63.2 |                                                                                                        74.1 |
+| [openai/whisper-large](https://huggingface.co/openai/whisper-large)                                                                               |                                                                                                        61.1 |                                                                                    66.4 |                                                                                                        74.9 |
+| [openai/whisper-medium](https://huggingface.co/openai/whisper-medium)                                                                             |                                                                                                        63.4 |                                                                                    69.5 |                                                                                                        76   |
+| [openai/whisper-base](https://huggingface.co/openai/whisper-base)                                                                                 |                                                                                                        87.2 |                                                                                    93   |                                                                                                        91.8 |
+| [openai/whisper-small](https://huggingface.co/openai/whisper-small)                                                                               |                                                                                                        74.2 |                                                                                    81.9 |                                                                                                        83   |
+| [openai/whisper-tiny](https://huggingface.co/openai/whisper-tiny)                                                                                 |                                                                                                        93.8 |                                                                                    97.6 |                                                                                                        94.9 |
 - ***Latency***: As kotoba-whisper uses the same architecture as [distil-whisper/distil-large-v3](https://huggingface.co/distil-whisper/distil-large-v3),
 it inherits the benefit of the improved latency compared to [openai/whisper-large-v3](https://huggingface.co/openai/whisper-large-v3)