Bambara-ASR-v2

Model Overview

Bambara-ASR-v2 is a cutting-edge speech recognition model tailored for Bambara (Bamanankan), Mali's primary language spoken by over 14 million people across West Africa. Built on OpenAI's Whisper-large-v2, this model represents a significant step in making technology accessible to Bambara speakers. Developed as part of the MALIBA-AI initiative, it embodies our commitment to ensuring no Malian is left behind in the AI revolution.

The model is focus on real-world applications and code-switching scenarios, particularly important in Mali's multilingual context where Bambara often interweaves with French.

Key Strengths

Superior Code-Switching: Handles natural Bambara-French mixing, reflecting real-world speech patterns
Phonetic Adaptation: Accurately transcribes French words as pronounced in Bambara context
Production-Ready: Thoroughly tested on real-world scenarios
Open Source: Released under the apache-2.0 license
African NLP Focus: Contributing to the broader goal of comprehensive African language support

Performance Metrics

WER: 0.3064
CER: 0.1261

Key Features

Robust handling of code-switched Bambara-French content
Accurate transcription of French words in Bambara context
Strong performance on real-world scenarios
Optimized for practical applications

Training Data

Trained on a diverse combination of datasets:

Jeli ASR Dataset: Primary training corpus with extensive Bambara speech
RT-Data-Collection: Additional audio samples (minimal subset) [bad recording condition not and enchanced]

This combination ensures the model performs well across:

Natural speech patterns
Code-switching scenarios
Various speaking contexts
Different recording conditions

Quick Start Guide

Installation

# Installation instructions coming soon

Basic Usage

# Usage instructions coming soon

Training Details

Training Hyperparameters

learning_rate: 0.001
train_batch_size: 8
eval_batch_size: 8
seed: 42
gradient_accumulation_steps: 4
total_train_batch_size: 32
optimizer: adamw_torch (betas=0.9,0.999, epsilon=1e-08)
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 50
num_epochs: 6
mixed_precision_training: Native AMP

Training Results

Training Loss	Epoch	Step	Validation Loss
0.4865	1.0	1708	0.4902
0.4231	2.0	3416	0.4316
0.3838	3.0	5124	0.3964
0.3197	4.0	6832	0.3725
0.252	5.0	8540	0.3558
0.2209	5.9969	10242	0.3517

Framework Versions

PEFT: 0.14.1.dev0
Transformers: 4.49.0.dev0
PyTorch: 2.5.1+cu124
Datasets: 3.2.0
Tokenizers: 0.21.0

Contributing to MALIBA-AI and African NLP

Bambara-ASR-v2 is developed as part of MALIBA-AI, a community initiative dedicated to democratizing AI technology across Mali's linguistic landscape. This model embodies our commitment to open science and the advancement of African language technologies, ensuring that technological progress serves every Malian.

Join our mission to democratize AI technology and ensure no Malian is left behind:

Open Science: Use and build upon our research - all code, models, and documentation are open source
Data Contribution: Share your Bambara speech datasets to help improve model performance
Research Collaboration: Integrate the model into your research projects and share your findings
Community Building: Help us create resources for African language processing
Educational Impact: Use the model in educational settings to train the next generation of African AI researchers

Together, we can ensure African languages are well-represented in the future of AI technology. Whether you're a researcher, developer, educator, or language enthusiast, your contributions can help bridge the technological divide.

License

Apache License 2.0

This model is released under the Apache 2.0 license to encourage research, commercial use, and innovation in African language technologies while ensuring proper attribution and patent protection. You are free to:

Use the model commercially
Modify and distribute the model
Create derivative works
Use the model for patent purposes

Choosing Apache 2.0 aligns with our goals of open science and advancing African NLP while providing necessary protections for the community.

Citation

@misc{bambara-asr2025,
  title={Bambara-ASR-v2: An ASR Model for Bambara with Enhanced Code-Switching Capabilities},
  author={MALIBA-AI},
  year={2025},
  publisher={MALIBA-AI},
  url={https://huggingface.co/sudoping01/bambara-asr-v2},
  version={2.0}
}

Acknowledgments

Developed by MALIBA-AI, building on OpenAI's Whisper-large-v2. Special thanks to the Bambara-speaking community and contributors from the Jeli ASR dataset project, and RT-Data-Collection initiative.

Try It Now!

Ready to transcribe Bambara audio with accuracy? Download Bambara-ASR-v2 and join MALIBA-AI in building a future where technology serves every Malian, in every language, through the power of community-driven innovation!

sudoping01
/

bambara-asr-v2

You need to agree to share your contact information to access this model