whisper_transcription_ID

Model description

This model is a fine-tuned version of openai/whisper-small on an Indonesian-English CoVoST2 dataset. It achieves the following results on the evaluation set:

  • Loss: 0.3101
  • Wer: 16.6264

Intended uses & limitations

This model is used to predict the transcription of Indonesian audio.

How to Use

This is how to use the model with Faster-Whisper.

  1. Convert the model into the CTranslate2 format with float16 quantization.

    !ct2-transformers-converter \
     --model cobrayyxx/whisper_transcription_ID \
     --output_dir ct2-whisper-small-transcription \
     --quantization float16 \
     --copy_files tokenizer_config.json
    
  2. Load the converted model using faster_whisper library.

    from faster_whisper import WhisperModel
    
    model_name = "ct2-whisper-small-transcription"  # converted model (after fine-tuning)
     
     # Run on GPU with FP16
    model = WhisperModel(model_name, device="cuda", compute_type="float16")
    
  3. Now, the loaded model can be used.

      tgt_lang = "en"
      segments, info = model.transcribe(<any-array-of-indonesian-auidio>,
                                   beam_size=5,
                                   language=tgt_lang,  # for transcription
                                   vad_filter=True,
                                   )
    
    
     transcription = " ".join([segment.text.strip() for segment in segments])
    

    Note: If you faced the kernel error everytime running the code above. You have to install nvidia-cublas and nvidia-cudnn

    apt update
    apt install libcudnn9-cuda-12
    

    and Install the library using pip. Read The Documentation for more.

    pip install nvidia-cublas-cu12 nvidia-cudnn-cu12==9.*
    
    export LD_LIBRARY_PATH=`python3 -c 'import os; import nvidia.cublas.lib; import nvidia.cudnn.lib; print(os.path.dirname(nvidia.cublas.lib.__file__) + ":" + os.path.dirname(nvidia.cudnn.lib.__file__))'`
    

    Special thanks to Yasmin Moslem for her help in resolving this.

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-05
  • train_batch_size: 16
  • eval_batch_size: 8
  • seed: 42
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • num_epochs: 10
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss Wer
0.2435 1.0 128 0.2594 17.0899
0.0774 2.0 256 0.2510 17.0294
0.0347 3.0 384 0.2610 16.7271
0.0161 4.0 512 0.2812 16.8884
0.0087 5.0 640 0.2879 16.9690
0.0024 6.0 768 0.2983 16.6868
0.0015 7.0 896 0.3029 16.3241
0.0012 8.0 1024 0.3074 16.4248
0.0011 9.0 1152 0.3094 16.6062
0.001 10.0 1280 0.3101 16.6264

Model Evaluation

The performance of the baseline and fine-tuned model were evaluated using the BLEU and CHRF++ metrics on the validation dataset. This fine-tuned model shows some improvement over the baseline model.

Model BLEU ChrF++
Baseline 59.18 81.55
Fine-Tuned 74.22 88.33

Evaluation details

  • BLEU: Measures the overlap between predicted and reference text based on n-grams.
  • CHRF: Uses character n-grams for evaluation, making it particularly suitable for morphologically rich languages.

Framework versions

  • Transformers 4.48.3
  • Pytorch 2.5.1+cu124
  • Datasets 3.3.0
  • Tokenizers 0.21.0

Credits:

Huge thanks to Yasmin Moslem for mentoring me.

Downloads last month
42
Safetensors
Model size
242M params
Tensor type
F32
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.

Model tree for cobrayyxx/whisper_transcription_ID

Finetuned
(2292)
this model

Dataset used to train cobrayyxx/whisper_transcription_ID