Automatic Speech Recognition
Transformers
Safetensors
Japanese
whisper
audio
hf-asr-leaderboard
Eval Results
asahi417 commited on
Commit
ee89219
·
verified ·
1 Parent(s): 16d4df0

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +24 -16
README.md CHANGED
@@ -85,28 +85,36 @@ Kotoba-whisper-v2.0 achieves better CER and WER than the [openai/whisper-large-v
85
  from ReazonSpeech, and achieves competitive CER and WER on the out-of-domain test sets including [JSUT basic 5000](https://sites.google.com/site/shinnosuketakamichi/publication/jsut) and
86
  the Japanese subset from [CommonVoice 8.0](https://huggingface.co/datasets/common_voice) (see [Evaluation](#evaluation) for detail).
87
 
 
88
  - ***CER***
 
 
 
 
 
 
 
 
 
 
 
89
 
90
- | Model | [CommonVoice 8.0](https://huggingface.co/datasets/japanese-asr/ja_asr.common_voice_8_0) | [JSUT basic5000](https://huggingface.co/datasets/japanese-asr/ja_asr.jsut_basic5000) | [ReazonSpeech Test](https://huggingface.co/datasets/japanese-asr/ja_asr.reazonspeech_test) |
91
- |:---------------------------------------------------------------------------------------------|-------------------:|-----------------:|--------------------:|
92
- | [**kotoba-tech/kotoba-whisper-v2.0**](https://huggingface.co/kotoba-tech/kotoba-whisper-v2.0)| 9.20 | 8.40 | **11.63** |
93
- | [kotoba-tech/kotoba-whisper-v1.0](https://huggingface.co/kotoba-tech/kotoba-whisper-v1.0) | 9.44 | 8.48 | 12.60 |
94
- | [openai/whisper-large-v3](https://huggingface.co/openai/whisper-large-v3) | **8.52** | **7.18** | 15.18 |
95
- | [openai/whisper-medium](https://huggingface.co/openai/whisper-medium) | 11.34 | 9.87 | 29.56 |
96
- | [openai/whisper-small](https://huggingface.co/openai/whisper-small) | 15.26 | 14.22 | 34.29 |
97
- | [openai/whisper-tiny](https://huggingface.co/openai/whisper-tiny) | 46.86 | 35.69 | 96.69 |
98
 
99
 
100
  - ***WER***
101
 
102
- | Model | [CommonVoice 8.0](https://huggingface.co/datasets/japanese-asr/ja_asr.common_voice_8_0) | [JSUT basic5000](https://huggingface.co/datasets/japanese-asr/ja_asr.jsut_basic5000) | [ReazonSpeech Test](https://huggingface.co/datasets/japanese-asr/ja_asr.reazonspeech_test) |
103
- |:------------------------------------------------------------------------------------------------|---------------------------:|----------------:|------------------:|
104
- | [**kotoba-tech/kotoba-whisper-v2.0**](https://huggingface.co/kotoba-tech/kotoba-whisper-v2.0) | 58.8 | 63.7 | **55.6** |
105
- | [kotoba-tech/kotoba-whisper-v1.0](https://huggingface.co/kotoba-tech/kotoba-whisper-v1.0) | 59.27 | 64.36 | 56.62 |
106
- | [openai/whisper-large-v3](https://huggingface.co/openai/whisper-large-v3) | **55.41** | **59.34** | 60.23 |
107
- | [openai/whisper-medium](https://huggingface.co/openai/whisper-medium) | 63.64 | 69.52 | 76.04 |
108
- | [openai/whisper-small](https://huggingface.co/openai/whisper-small) | 74.21 | 82.02 | 82.99 |
109
- | [openai/whisper-tiny](https://huggingface.co/openai/whisper-tiny) | 93.78 | 97.72 | 94.85 |
 
 
 
 
110
 
111
  - ***Latency***: As kotoba-whisper uses the same architecture as [distil-whisper/distil-large-v3](https://huggingface.co/distil-whisper/distil-large-v3),
112
  it inherits the benefit of the improved latency compared to [openai/whisper-large-v3](https://huggingface.co/openai/whisper-large-v3)
 
85
  from ReazonSpeech, and achieves competitive CER and WER on the out-of-domain test sets including [JSUT basic 5000](https://sites.google.com/site/shinnosuketakamichi/publication/jsut) and
86
  the Japanese subset from [CommonVoice 8.0](https://huggingface.co/datasets/common_voice) (see [Evaluation](#evaluation) for detail).
87
 
88
+
89
  - ***CER***
90
+ | model | [CommonVoice 8 (Japanese test set)](https://huggingface.co/datasets/japanese-asr/ja_asr.common_voice_8_0) | [JSUT Basic 5000](https://huggingface.co/datasets/japanese-asr/ja_asr.jsut_basic5000) | [ReazonSpeech (held out test set)](https://huggingface.co/datasets/japanese-asr/ja_asr.reazonspeech_test) |
91
+ |:--------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------:|----------------------------------------------------------------------------------------:|------------------------------------------------------------------------------------------------------------:|
92
+ | [kotoba-tech/kotoba-whisper-v2.0](https://huggingface.co/kotoba-tech/kotoba-whisper-v2.0) | 9.2 | 8.4 | 11.6 |
93
+ | [kotoba-tech/kotoba-whisper-v1.0](https://huggingface.co/kotoba-tech/kotoba-whisper-v1.0) | 9.4 | 8.5 | 12.2 |
94
+ | [openai/whisper-large-v3](https://huggingface.co/openai/whisper-large-v3) | 8.5 | 7.1 | 14.9 |
95
+ | [openai/whisper-large-v2](https://huggingface.co/openai/whisper-large-v2) | 9.7 | 8.2 | 28.1 |
96
+ | [openai/whisper-large](https://huggingface.co/openai/whisper-large) | 10 | 8.9 | 34.1 |
97
+ | [openai/whisper-medium](https://huggingface.co/openai/whisper-medium) | 11.5 | 10 | 33.2 |
98
+ | [openai/whisper-base](https://huggingface.co/openai/whisper-base) | 28.6 | 24.9 | 70.4 |
99
+ | [openai/whisper-small](https://huggingface.co/openai/whisper-small) | 15.1 | 14.2 | 41.5 |
100
+ | [openai/whisper-tiny](https://huggingface.co/openai/whisper-tiny) | 53.7 | 36.5 | 137.9 |
101
 
 
 
 
 
 
 
 
 
102
 
103
 
104
  - ***WER***
105
 
106
+ | model | [CommonVoice 8 (Japanese test set)](https://huggingface.co/datasets/japanese-asr/ja_asr.common_voice_8_0) | [JSUT Basic 5000](https://huggingface.co/datasets/japanese-asr/ja_asr.jsut_basic5000) | [ReazonSpeech (held out test set)](https://huggingface.co/datasets/japanese-asr/ja_asr.reazonspeech_test) |
107
+ |:--------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------:|----------------------------------------------------------------------------------------:|------------------------------------------------------------------------------------------------------------:|
108
+ | [kotoba-tech/kotoba-whisper-v2.0](https://huggingface.co/kotoba-tech/kotoba-whisper-v2.0) | 58.8 | 63.7 | 55.6 |
109
+ | [kotoba-tech/kotoba-whisper-v1.0](https://huggingface.co/kotoba-tech/kotoba-whisper-v1.0) | 59.2 | 64.3 | 56.4 |
110
+ | [openai/whisper-large-v3](https://huggingface.co/openai/whisper-large-v3) | 55.1 | 59.2 | 60.2 |
111
+ | [openai/whisper-large-v2](https://huggingface.co/openai/whisper-large-v2) | 59.3 | 63.2 | 74.1 |
112
+ | [openai/whisper-large](https://huggingface.co/openai/whisper-large) | 61.1 | 66.4 | 74.9 |
113
+ | [openai/whisper-medium](https://huggingface.co/openai/whisper-medium) | 63.4 | 69.5 | 76 |
114
+ | [openai/whisper-base](https://huggingface.co/openai/whisper-base) | 87.2 | 93 | 91.8 |
115
+ | [openai/whisper-small](https://huggingface.co/openai/whisper-small) | 74.2 | 81.9 | 83 |
116
+ | [openai/whisper-tiny](https://huggingface.co/openai/whisper-tiny) | 93.8 | 97.6 | 94.9 |
117
+
118
 
119
  - ***Latency***: As kotoba-whisper uses the same architecture as [distil-whisper/distil-large-v3](https://huggingface.co/distil-whisper/distil-large-v3),
120
  it inherits the benefit of the improved latency compared to [openai/whisper-large-v3](https://huggingface.co/openai/whisper-large-v3)