TG-Whisper-Tiny-FineTuned-DL-Twi / README.md

Update readme

183dd82 verified 4 months ago

3.6 kB

	---
	library_name: transformers
	language:
	- twi
	license: apache-2.0
	base_model: openai/whisper-tiny
	tags:
	- custom-dataset
	- local-dataset
	- whisper
	- generated_from_trainer
	metrics:
	- wer
	model-index:
	- name: T6-Whisper-FineTuned-DL-Twi
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# T6-Whisper-FineTuned-DL-Twi

	This model is a fine-tuned version of [openai/whisper-tiny](https://huggingface.co/openai/whisper-tiny) on the Twi-native Ghanaian language. dataset.
	It achieves the following results on the evaluation set:
	- Loss: 0.0063
	- Wer: 23.4562
	- Cer: 21.7611

	## Model description

	T6-Whisper-FineTuned-DL-Twi is a fine-tuned version of openai/whisper-tiny focused specifically on the Twi language, a widely spoken native language in Ghana. This model adapts Whisper’s multilingual speech recognition capabilities to better understand and transcribe Twi speech, especially in financial contexts.

	It was developed as part of a project to support accessibility in financial systems, aiming to make digital financial services more inclusive for Ghanaian communities that primarily communicate in Twi.

	## Intended uses & limitations

	Intended uses:
	- Automatic Speech Recognition (ASR) for Twi and English-Twi mixed audio.
	- Enhancing voice interfaces in fintech platforms (e.g., mobile banking, customer support).
	- Increasing accessibility for low-literate or visually impaired users in financial contexts.
	- Supporting research in code-switched speech and low-resource African languages.

	Limitations:
	- May not perform optimally outside the financial domain (e.g., health or legal speech).
	- Performance can degrade in noisy environments or with heavy accents not represented in the training data.
	- While it handles code-switching, rapid or highly irregular switches may still reduce accuracy.
	- Based on the Whisper-tiny model, which is optimized for speed and size, not peak performance.

	## Training and evaluation data

	The model was fine-tuned using a custom dataset containing Twi and English-Twi code-switched audio, primarily from the financial domain. This includes content like:

	- Mobile money instructions
	- Banking app voice interactions
	- Financial literacy radio shows and interviews
	- Call center conversations involving customer queries
	- Dataset size: ~ 50 hours
	- Language mix: Twi + English (code-switched)
	- Transcript quality: Manually verified by native speakers
	- Train/validation split: [e.g., 80/20]


	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 1e-05
	- train_batch_size: 16
	- eval_batch_size: 8
	- seed: 42
	- optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
	- lr_scheduler_type: linear
	- lr_scheduler_warmup_steps: 500
	- training_steps: 4000
	- mixed_precision_training: Native AMP

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \| Wer \| Cer \|
	\|:-------------:\|:------:\|:----:\|:---------------:\|:-------:\|:-------:\|
	\| 0.025 \| 0.6333 \| 1000 \| 0.0285 \| 27.9879 \| 21.3775 \|
	\| 0.0083 \| 1.2666 \| 2000 \| 0.0094 \| 20.4318 \| 17.7329 \|
	\| 0.0058 \| 1.8999 \| 3000 \| 0.0072 \| 19.5177 \| 17.5028 \|
	\| 0.0012 \| 2.5332 \| 4000 \| 0.0063 \| 23.4562 \| 21.7611 \|


	### Framework versions

	- Transformers 4.48.0.dev0
	- Pytorch 2.5.1+cu121
	- Datasets 3.2.0
	- Tokenizers 0.21.0