open_asr_leaderboard

Runtime error

App Files Files

xet

Community

wasertech commited on Sep 11, 2023

Commit

f25ec37

1 Parent(s): 651bbfe

Update constants.py

Browse files

Files changed (1) hide show

constants.py +1 -1

constants.py CHANGED Viewed

@@ -112,7 +112,7 @@ Models are sorted by consistancy in their results across testsets. (by increasin
 ### Results
 The CommonVoice Test provides a Word Error Rate (WER) within a 20-point margin of the average WER. While not perfect, this indicates that CommonVoice can be a useful tool for quickly identifying a suitable ASR model for a wide range of languages in a programmatic manner. However, it's important to note that it is not sufficient as the sole criterion for choosing the most appropriate architecture. Further considerations may be needed depending on the specific requirements of your ASR application.
-Moreover, it's worth noting that selecting the model with the lowest WER on CommonVoice aligns with choosing the model based on the lowest average WER. This approach proves effective for ranking the best-performing models with precision. However, it's essential to acknowledge that as the average WER increases, the spread of results becomes more pronounced. This can pose challenges in reliably identifying the worst-performing models. The test split size of CommonVoice for a given language is a crucial factor in this context, and it's worth considering. This insight highlights the need for a nuanced approach to ASR model selection, considering various factors, including dataset characteristics, to ensure a comprehensive evaluation of ASR model performance.
 Additionally, it has come to our attention that Nvidia's models, trained using NeMo with custom splits from common datasets, including Common Voice, may have had an advantage due to their familiarity with parts of the Common Voice test set. It's important to note that this highlights the need for greater transparency in data usage, as OpenAI itself does not publish the data they used for training. This could explain their strong performance in the results. Transparency in model training and dataset usage is crucial for fair comparisons in the ASR field and ensuring that results align with real-world scenarios.

 ### Results
 The CommonVoice Test provides a Word Error Rate (WER) within a 20-point margin of the average WER. While not perfect, this indicates that CommonVoice can be a useful tool for quickly identifying a suitable ASR model for a wide range of languages in a programmatic manner. However, it's important to note that it is not sufficient as the sole criterion for choosing the most appropriate architecture. Further considerations may be needed depending on the specific requirements of your ASR application.
+Furthermore, it's important to highlight that opting for the model with the lowest WER on CommonVoice typically aligns closely with selecting a model based on the lowest average WER. This approach has consistently proven effective in pinpointing the best-performing models, even if there is a minor 0.01 point différence (with this data). This slight variance could potentially be attributed to statistical noise and does not diminish the precision of our technique in identifying models with low WER.
 Additionally, it has come to our attention that Nvidia's models, trained using NeMo with custom splits from common datasets, including Common Voice, may have had an advantage due to their familiarity with parts of the Common Voice test set. It's important to note that this highlights the need for greater transparency in data usage, as OpenAI itself does not publish the data they used for training. This could explain their strong performance in the results. Transparency in model training and dataset usage is crucial for fair comparisons in the ASR field and ensuring that results align with real-world scenarios.