whisper-small-es

Model Overview

This model was developed as part of a workshop organized by Yasmin Moslem, focusing on speech-to-text pipelines. The workshop's primary goal was to enable accurate transcription and translation of spoken source languages into written target languages while learning about end-to-end and cascaded approaches in the process.

This model is a fine-tuned version of OpenAI's Whisper-Small trained on the voxpopuli_es-ja dataset for Spanish Automatic Speech Recognition (ASR).

The model achieves performance metrics on the provided dataset:

Evaluation Set:

  • Loss: 0.2071
  • WER: 9.5996

Test Set:

  • WER: 10.1251

(Baseline evaluation on test set: 36.7506)

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-06
  • train_batch_size: 32
  • eval_batch_size: 16
  • seed: 42
  • distributed_type: multi-GPU
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • training_steps: 1000

Training results

Training Loss Epoch Step Validation Loss Wer
0.2207 0.8013 250 0.2146 9.9606
0.1558 1.6026 500 0.2071 9.5996
0.1373 2.4038 750 0.2067 9.6622
0.1133 3.2051 1000 0.2055 9.6438

Framework versions

  • Transformers 4.45.2
  • Pytorch 2.4.0+cu124
  • Datasets 3.2.0
  • Tokenizers 0.20.3

Linked Models

Model Card Contact

Mariano González ([email protected])

Downloads last month
33
Safetensors
Model size
242M params
Tensor type
F32
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.

Model tree for Marianoleiras/whisper-small-es

Finetuned
(2288)
this model

Dataset used to train Marianoleiras/whisper-small-es