whisper_transcription_ID

Model description

This model is a fine-tuned version of openai/whisper-small on an Indonesian-English CoVoST2 dataset. It achieves the following results on the evaluation set:

Loss: 0.3101
Wer: 16.6264

Intended uses & limitations

This model is used to predict the transcription of Indonesian audio.

How to Use

This is how to use the model with Faster-Whisper.

Convert the model into the CTranslate2 format with float16 quantization.

!ct2-transformers-converter \
 --model cobrayyxx/whisper_transcription_ID \
 --output_dir ct2-whisper-small-transcription \
 --quantization float16 \
 --copy_files tokenizer_config.json

Load the converted model using faster_whisper library.

from faster_whisper import WhisperModel

model_name = "ct2-whisper-small-transcription"  # converted model (after fine-tuning)
 
 # Run on GPU with FP16
model = WhisperModel(model_name, device="cuda", compute_type="float16")

Now, the loaded model can be used.

  tgt_lang = "en"
  segments, info = model.transcribe(<any-array-of-indonesian-auidio>,
                               beam_size=5,
                               language=tgt_lang,  # for transcription
                               vad_filter=True,
                               )


 transcription = " ".join([segment.text.strip() for segment in segments])

Note: If you faced the kernel error everytime running the code above. You have to install nvidia-cublas and nvidia-cudnn

apt update
apt install libcudnn9-cuda-12

and Install the library using pip. Read The Documentation for more.

pip install nvidia-cublas-cu12 nvidia-cudnn-cu12==9.*

export LD_LIBRARY_PATH=`python3 -c 'import os; import nvidia.cublas.lib; import nvidia.cudnn.lib; print(os.path.dirname(nvidia.cublas.lib.__file__) + ":" + os.path.dirname(nvidia.cudnn.lib.__file__))'`

Special thanks to Yasmin Moslem for her help in resolving this.

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1e-05
train_batch_size: 16
eval_batch_size: 8
seed: 42
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
num_epochs: 10
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss	Wer
0.2435	1.0	128	0.2594	17.0899
0.0774	2.0	256	0.2510	17.0294
0.0347	3.0	384	0.2610	16.7271
0.0161	4.0	512	0.2812	16.8884
0.0087	5.0	640	0.2879	16.9690
0.0024	6.0	768	0.2983	16.6868
0.0015	7.0	896	0.3029	16.3241
0.0012	8.0	1024	0.3074	16.4248
0.0011	9.0	1152	0.3094	16.6062
0.001	10.0	1280	0.3101	16.6264

Model Evaluation

The performance of the baseline and fine-tuned model were evaluated using the BLEU and CHRF++ metrics on the validation dataset. This fine-tuned model shows some improvement over the baseline model.

Model	BLEU	ChrF++
Baseline	59.18	81.55
Fine-Tuned	74.22	88.33

Evaluation details

BLEU: Measures the overlap between predicted and reference text based on n-grams.
CHRF: Uses character n-grams for evaluation, making it particularly suitable for morphologically rich languages.

Framework versions

Transformers 4.48.3
Pytorch 2.5.1+cu124
Datasets 3.3.0
Tokenizers 0.21.0

Credits:

Huge thanks to Yasmin Moslem for mentoring me.

cobrayyxx
/

whisper_transcription_ID