|
--- |
|
library_name: transformers |
|
license: openrail |
|
datasets: |
|
- alexandrainst/coral |
|
language: |
|
- da |
|
metrics: |
|
- wer |
|
- cer |
|
base_model: |
|
- openai/whisper-large-v3 |
|
pipeline_tag: automatic-speech-recognition |
|
model-index: |
|
- name: coral-1-whisper-large |
|
results: |
|
- task: |
|
type: automatic-speech-recognition |
|
name: Automatic Speech Recognition |
|
dataset: |
|
name: CoRal read-aloud |
|
type: alexandrainst/coral |
|
split: test |
|
args: read_aloud |
|
metrics: |
|
- type: cer |
|
value: 4.3% ± 0.2% |
|
name: CER |
|
- type: wer |
|
value: 10.4% ± 0.3% |
|
name: WER |
|
--- |
|
|
|
# Whisper-Large v.3 trained on CoRaL release 1 |
|
|
|
This is a Danish state-of-the-art speech recognition model, trained by [Alvenir](https://www.alvenir.ai/). |
|
|
|
|
|
## Evaluation Results |
|
|
|
| Model | Number of parameters | [CoRal](https://huggingface.co/datasets/alexandrainst/coral/viewer/read_aloud/test) CER | [CoRal](https://huggingface.co/datasets/alexandrainst/coral/viewer/read_aloud/test) WER | |
|
|:---|---:|---:|---:| |
|
| [Alvenir/coral-1-whisper-large](https://huggingface.co/Alvenir/coral-1-whisper-large) | 1540M | **4.3% ± 0.2%** | **10.4% ± 0.3%** | |
|
| [alexandrainst/roest-315m](https://huggingface.co/alexandrainst/roest-315m) | 315M | 6.6% ± 0.2% | 17.0% ± 0.4% | |
|
| [mhenrichsen/hviske-v2](https://huggingface.co/syvai/hviske-v2) | 1540M | 4.7% ± 0.07% | 11.8% ± 0.3% | |
|
| [openai/whisper-large-v3](https://hf.co/openai/whisper-large-v3) | 1540M | 11.4% ± 0.3% | 28.3% ± 0.6% | |
|
|
|
Results of more models and more datasets can be seen in the [model card for Røst-315m](https://huggingface.co/alexandrainst/roest-315m). |
|
|
|
## Model details |
|
|
|
This is simply the [Whisper Large v.3 model](https://hf.co/openai/whisper-large-v3) trained on the first release of [CoRaL data](https://huggingface.co/datasets/alexandrainst/coral). |
|
|
|
The model was trained for 30K steps using the configuration from the [CoRaL repository](https://github.com/alexandrainst/coral) by running: |
|
```py |
|
|
|
python src/scripts/finetune_asr_model.py model=whisper-large max_steps=30000 model.learning_rate=1e-5 |
|
``` |
|
|
|
## License |
|
|
|
Note that the dataset used is licensed under a custom license, adapted from OpenRAIL-M, which allows |
|
commercial use with a few restrictions (speech synthesis and biometric identification). |
|
See |
|
[license](https://huggingface.co/Alvenir/coral-1-whisper-large/blob/main/LICENSE). |
|
|
|
|
|
## Creators and Funders |
|
The CoRal project is funded by the [Danish Innovation |
|
Fund](https://innovationsfonden.dk/) and consists of the following partners: |
|
|
|
- [Alexandra Institute](https://alexandra.dk/) |
|
- [University of Copenhagen](https://www.ku.dk/) |
|
- [Agency for Digital Government](https://digst.dk/) |
|
- [Alvenir](https://www.alvenir.ai/) |
|
- [Corti](https://www.corti.ai/) |
|
|
|
We would like specifically thank Dan Saattrup Nielsen, Alexandra Institute for (among other things) the repository work and Simon Leminen Madsen, Alexandra Institute for modelling work. |