whisper-small-es
Model Overview
This model was developed as part of a workshop organized by Yasmin Moslem, focusing on speech-to-text pipelines. The workshop's primary goal was to enable accurate transcription and translation of spoken source languages into written target languages while learning about end-to-end and cascaded approaches in the process.
This model is a fine-tuned version of OpenAI's Whisper-Small trained on the voxpopuli_es-ja dataset for Spanish Automatic Speech Recognition (ASR).
The model achieves performance metrics on the provided dataset:
Evaluation Set:
- Loss: 0.2071
- WER: 9.5996
Test Set:
- WER: 10.1251
(Baseline evaluation on test set: 36.7506)
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-06
- train_batch_size: 32
- eval_batch_size: 16
- seed: 42
- distributed_type: multi-GPU
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- training_steps: 1000
Training results
Training Loss | Epoch | Step | Validation Loss | Wer |
---|---|---|---|---|
0.2207 | 0.8013 | 250 | 0.2146 | 9.9606 |
0.1558 | 1.6026 | 500 | 0.2071 | 9.5996 |
0.1373 | 2.4038 | 750 | 0.2067 | 9.6622 |
0.1133 | 3.2051 | 1000 | 0.2055 | 9.6438 |
Framework versions
- Transformers 4.45.2
- Pytorch 2.4.0+cu124
- Datasets 3.2.0
- Tokenizers 0.20.3
Linked Models
- Whisper-Small-es-ja: An end-to-end model trained on this dataset.
- NLLB-200-Distilled-es-ja: The MT model of the cascaded approach built using this dataset.
Model Card Contact
Mariano González ([email protected])
- Downloads last month
- 33
Inference Providers
NEW
This model is not currently available via any of the supported Inference Providers.
Model tree for Marianoleiras/whisper-small-es
Base model
openai/whisper-small