Whosper-large

Model Overview

Whosper-large is a fine-tuned version of openai/whisper-large-v2 optimized for Wolof speech recognition Senegal's primary language, while maintaining strong multilingual capabilities. Built on OpenAI's Whisper-large-v2, it advances African language processing with notable improvements in Word Error Rate (WER) and Character Error Rate (CER). Whether you're transcribing conversations, building language learning tools, or conducting research, this model is designed for researchers, developers, and students working with Wolof speech data.

Key Strengths

Strong Multilingual: Excellent performance in Wolof, French, and English
Code-Switching: Handles natural language mixing, especially Wolof-French
Consistent Results: Maintains quality across different languages
Open Source: Released under the apache-2.0 license
African NLP: Supporting African language technology development

Performance Metrics

WER: 0.2423
CER: 0.1135

Key Features

Strong multilingual performance (Wolof, French, English)
Excellent performance on code-switched content
Consistent performance across different languages

Limitations

Outputs in lowercase only
Limited punctuation support
Low performances on bad quality audios

Training Data

Trained on diverse Wolof speech data:

ALFFA Public Dataset
FLEURS Dataset
Bus Urbain Dataset
Kallama Dataset

Quick Start Guide

Installation

pip install git+https://github.com/sudoping01/[email protected]

Basic Usage

from whosper import WhosperTranscriber

# Initialize the transcriber
transcriber = WhosperTranscriber(model_id="CAYTU/whosper-large") 

# Transcribe an audio file
result = transcriber.transcribe_audio("path/to/your/audio.wav")
print(result)

Training Results

Training Loss	Epoch	Step	Validation Loss
3.0514	1.0	1732	0.6824
2.2658	2.0	3464	0.5998
2.0274	3.0	5196	0.5282
1.48	4.0	6928	0.4793
1.1693	5.0	8660	0.4441
0.8762	5.9970	10386	0.4371

Framework Versions

PEFT: 0.14.1.dev0
Transformers: 4.48.0.dev0
PyTorch: 2.5.1+cu124
Datasets: 3.2.0
Tokenizers: 0.21.0

Contributing to African NLP

Whosper-large embodies our commitment to open science and the advancement of African language technologies. We believe that by making cutting-edge speech recognition models freely available, we can accelerate NLP development across Africa.

Join our mission to democratize AI technology:

Open Science: Use and build upon our research - all code, models, and documentation are open source
Research Collaboration: Integrate Whosper into your research projects and share your findings
Community Building: Help us create resources for African language processing
Educational Impact: Use Whosper in educational settings to train the next generation of African AI researchers

License

Apache License 2.0

This model is released under the Apache License 2.0 to encourage research, commercial use, and innovation in African language technologies while ensuring proper attribution and patent protection.

Citation

@misc{whosper2025,
  title={Whosper-large: A Multilingual ASR Model for Wolof with Enhanced Code-Switching Capabilities},
  author={Seydou DIALLO},
  year={2025},
  publisher={Hugging Face},
  url={https://huggingface.co/CAYTU/whosper-large},
  version={1.0}
}

Acknowledgments

Developed by Seydou DIALLO at Caytu Robotics's AI Department, building on OpenAI's Whisper-large-v2. Special thanks to the Wolof-speaking community and contributors advancing African language technology.

Contact US

For any question or support contact us

Email : [email protected]

CAYTU
/

whosper-large