whisper-small-es

Model Overview

This model was developed as part of a workshop organized by Yasmin Moslem, focusing on speech-to-text pipelines. The workshop's primary goal was to enable accurate transcription and translation of spoken source languages into written target languages while learning about end-to-end and cascaded approaches in the process.

This model is a fine-tuned version of OpenAI's Whisper-Small trained on the voxpopuli_es-ja dataset for Spanish Automatic Speech Recognition (ASR).

The model achieves performance metrics on the provided dataset:

Evaluation Set:

Loss: 0.2071
WER: 9.5996

Test Set:

WER: 10.1251

(Baseline evaluation on test set: 36.7506)

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-06
train_batch_size: 32
eval_batch_size: 16
seed: 42
distributed_type: multi-GPU
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
training_steps: 1000

Training results

Training Loss	Epoch	Step	Validation Loss	Wer
0.2207	0.8013	250	0.2146	9.9606
0.1558	1.6026	500	0.2071	9.5996
0.1373	2.4038	750	0.2067	9.6622
0.1133	3.2051	1000	0.2055	9.6438

Framework versions

Transformers 4.45.2
Pytorch 2.4.0+cu124
Datasets 3.2.0
Tokenizers 0.20.3

Linked Models

Whisper-Small-es-ja: An end-to-end model trained on this dataset.
NLLB-200-Distilled-es-ja: The MT model of the cascaded approach built using this dataset.

Model Card Contact

Mariano González ([email protected])

Marianoleiras
/

whisper-small-es