whisper_transcription_ID
Model description
This model is a fine-tuned version of openai/whisper-small on an Indonesian-English CoVoST2 dataset. It achieves the following results on the evaluation set:
- Loss: 0.3101
- Wer: 16.6264
Intended uses & limitations
This model is used to predict the transcription of Indonesian audio.
How to Use
This is how to use the model with Faster-Whisper.
Convert the model into the CTranslate2 format with float16 quantization.
!ct2-transformers-converter \ --model cobrayyxx/whisper_transcription_ID \ --output_dir ct2-whisper-small-transcription \ --quantization float16 \ --copy_files tokenizer_config.json
Load the converted model using
faster_whisper
library.from faster_whisper import WhisperModel model_name = "ct2-whisper-small-transcription" # converted model (after fine-tuning) # Run on GPU with FP16 model = WhisperModel(model_name, device="cuda", compute_type="float16")
Now, the loaded model can be used.
tgt_lang = "en" segments, info = model.transcribe(<any-array-of-indonesian-auidio>, beam_size=5, language=tgt_lang, # for transcription vad_filter=True, ) transcription = " ".join([segment.text.strip() for segment in segments])
Note: If you faced the kernel error everytime running the code above. You have to install
nvidia-cublas
andnvidia-cudnn
apt update apt install libcudnn9-cuda-12
and Install the library using pip. Read The Documentation for more.
pip install nvidia-cublas-cu12 nvidia-cudnn-cu12==9.* export LD_LIBRARY_PATH=`python3 -c 'import os; import nvidia.cublas.lib; import nvidia.cudnn.lib; print(os.path.dirname(nvidia.cublas.lib.__file__) + ":" + os.path.dirname(nvidia.cudnn.lib.__file__))'`
Special thanks to Yasmin Moslem for her help in resolving this.
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 1e-05
- train_batch_size: 16
- eval_batch_size: 8
- seed: 42
- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: linear
- num_epochs: 10
- mixed_precision_training: Native AMP
Training results
Training Loss | Epoch | Step | Validation Loss | Wer |
---|---|---|---|---|
0.2435 | 1.0 | 128 | 0.2594 | 17.0899 |
0.0774 | 2.0 | 256 | 0.2510 | 17.0294 |
0.0347 | 3.0 | 384 | 0.2610 | 16.7271 |
0.0161 | 4.0 | 512 | 0.2812 | 16.8884 |
0.0087 | 5.0 | 640 | 0.2879 | 16.9690 |
0.0024 | 6.0 | 768 | 0.2983 | 16.6868 |
0.0015 | 7.0 | 896 | 0.3029 | 16.3241 |
0.0012 | 8.0 | 1024 | 0.3074 | 16.4248 |
0.0011 | 9.0 | 1152 | 0.3094 | 16.6062 |
0.001 | 10.0 | 1280 | 0.3101 | 16.6264 |
Model Evaluation
The performance of the baseline and fine-tuned model were evaluated using the BLEU and CHRF++ metrics on the validation dataset. This fine-tuned model shows some improvement over the baseline model.
Model | BLEU | ChrF++ |
---|---|---|
Baseline | 59.18 | 81.55 |
Fine-Tuned | 74.22 | 88.33 |
Evaluation details
- BLEU: Measures the overlap between predicted and reference text based on n-grams.
- CHRF: Uses character n-grams for evaluation, making it particularly suitable for morphologically rich languages.
Framework versions
- Transformers 4.48.3
- Pytorch 2.5.1+cu124
- Datasets 3.3.0
- Tokenizers 0.21.0
Credits:
Huge thanks to Yasmin Moslem for mentoring me.
- Downloads last month
- 42
Model tree for cobrayyxx/whisper_transcription_ID
Base model
openai/whisper-small