File size: 6,102 Bytes
719463c
 
dc2c8ff
719463c
 
 
dc2c8ff
af5b59f
 
 
f2b6b14
 
 
dc2c8ff
e7eb7cc
f2b6b14
 
 
 
 
 
 
 
 
 
 
 
 
e18745b
f2b6b14
 
f331153
af5b59f
719463c
dc2c8ff
e7eb7cc
719463c
af5b59f
e7eb7cc
e18745b
 
 
 
 
eb47d97
e18745b
af5b59f
 
e18745b
 
 
 
af5b59f
e18745b
c0cefa4
e7eb7cc
e18745b
 
 
c0cefa4
af5b59f
eb47d97
af5b59f
 
 
 
eb47d97
 
e7eb7cc
af5b59f
 
e18745b
af5b59f
e18745b
 
 
 
 
c0cefa4
e18745b
 
 
 
 
c0cefa4
e18745b
5f003c5
e18745b
5f003c5
 
 
 
e18745b
5f003c5
 
 
 
a4a2d18
5f003c5
 
 
 
 
 
af5b59f
 
 
e18745b
af5b59f
 
 
 
 
 
 
 
 
 
 
 
 
 
e18745b
e7eb7cc
b145655
 
 
 
 
 
 
 
 
e18745b
af5b59f
b145655
 
 
 
 
 
 
 
 
af5b59f
e7eb7cc
af5b59f
 
 
e7eb7cc
926ab80
af5b59f
e7eb7cc
 
 
af5b59f
 
 
 
79b7d07
e7eb7cc
 
 
e18745b
e7eb7cc
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
---
library_name: peft
license: apache-2.0
base_model: openai/whisper-large-v2
tags:
- generated_from_trainer
- multilingual
- ASR
- Open-Source
language:
- wo
- fr
- en
model-index:
- name: whosper-large-v2
  results:
  - task:
      name: Automatic Speech Recognition
      type: automatic-speech-recognition
    dataset:
      name: Test Set
      type: custom
      split: test
      args:
        language: wo
    metrics:
    - name: Test WER
      type: wer
      value: 23.45 
    - name: Test CER
      type: cer
      value: 11.01
pipeline_tag: automatic-speech-recognition
---

# Whosper-large-v2

## Model Overview
Whosper-large-v2 is a cutting-edge speech recognition model tailored for Wolof, Senegal's primary language. Built on OpenAI's [Whisper-large-v2](https://huggingface.co/openai/whisper-large-v2), it advances African language processing with notable improvements in Word Error Rate (WER) and Character Error Rate (CER). Whether you're transcribing conversations, building language learning tools, or conducting research, this model is designed for researchers, developers, and students working with Wolof speech data.

### Key Strengths
- **Superior Code-Switching**: Handles natural Wolof-French/English mixing, mirroring real-world speech patterns
- **Multilingual**: Performs well in French and English in addition to Wolof
- **Production-Ready**: Thoroughly tested and optimized for deployment
- **Open Source**: Released under the  [apache-2.0](https://www.apache.org/licenses/LICENSE-2.0) license, perfect for research and development
- **African NLP Focus**: Contributing to the broader goal of comprehensive African language support

## Performance Metrics
- **WER**: 0.2345 
- **CER**: 0.1101 

Lower values mean better accuracy—ideal for practical applications!

### Performance Comparison

| Metric | Whosper-large-v2 | Whosper-large | Improvement |
|--------|------------------|---------------|-------------|
| WER    | 0.2345          | 0.2423        | 3.2% better |
| CER    | 0.1101          | 0.1135        | 3.0% better |

## Key Features
- Improved WER and CER compared to [whosper-large](https://huggingface.co/sudoping01/whosper-large)
- Optimized for Wolof and French recognition
- Enhanced performance on bilingual content

## Limitations
- Reduced performance on English compared to [whosper-large](https://huggingface.co/sudoping01/whosper-large)
- Less effective for general multilingual content compared to [whosper-large](https://huggingface.co/sudoping01/whosper-large)
- Low performances on very bad audios quality

## Training Data
Trained on diverse Wolof speech data:

- **ALFFA Public Dataset**
- **FLEURS Dataset**
- **Bus Urbain Dataset**
- **Anta Women TTS Dataset** 
- **Kallama Dataset**

This diversity ensures the model excels across:
- Speaking styles and dialects
- Code-switching patterns
- Gender and age groups
- Recording conditions

## Quick Start Guide

### Installation
```bash
pip install git+https://github.com/sudoping01/[email protected]
```

### Basic Usage
```python
from whosper import WhosperTranscriber

# Initialize the transcriber
transcriber = WhosperTranscriber(model_id="CAYTU/whosper-large-v2") 

# Transcribe an audio file
result = transcriber.transcribe_audio("path/to/your/audio.wav")
print(result)
```


### Training Results
| Training Loss | Epoch | Step | Validation Loss |
|---------------|-------|------|-----------------|
| 0.7575 | 0.9998 | 2354 | 0.7068 |
| 0.6429 | 1.9998 | 4708 | 0.6073 |
| 0.5468 | 2.9998 | 7062 | 0.5428 |
| 0.4439 | 3.9998 | 9416 | 0.4935 |
| 0.3208 | 4.9998 | 11770 | 0.4600 |
| 0.2394 | 5.9998 | 14124 | 0.4490 |

## Framework Versions
- PEFT: 0.14.1.dev0
- Transformers: 4.49.0.dev0
- PyTorch: 2.5.1+cu124
- Datasets: 3.2.0
- Tokenizers: 0.21.0

## Contributing to African NLP
Whosper-large-v2 embodies our commitment to open science and the advancement of African language technologies. We believe that by making cutting-edge speech recognition models freely available, we can accelerate NLP development across Africa.

Join our mission to democratize AI technology:
- **Open Science**: Use and build upon our research - all code, models, and documentation are open source
- **Data Contribution**: Share your Wolof speech datasets to help improve model performance
- **Research Collaboration**: Integrate Whosper into your research projects and share your findings
- **Community Building**: Help us create resources for African language processing
- **Educational Impact**: Use Whosper in educational settings to train the next generation of African AI researchers

Together, we can ensure African languages are well-represented in the future of AI technology. Whether you're a researcher, developer, educator, or language enthusiast, your contributions can help bridge the technological divide.

## License
[Apache License 2.0](https://www.apache.org/licenses/LICENSE-2.0)

This model is released under the Apache 2.0 license to encourage research, commercial use, and innovation in African language technologies while ensuring proper attribution and patent protection. You are free to:
- Use the model commercially
- Modify and distribute the model
- Create derivative works
- Use the model for patent purposes

Choosing Apache 2.0 aligns with our goals of open science and advancing African NLP while providing necessary protections for the community.


## Citation
```bibtex
@misc{whosper2025,
  title={Whosper-large: A Multilingual ASR Model for Wolof with Enhanced Code-Switching Capabilities},
  author={Seydou DIALLO},
  year={2025},
  publisher={Hugging Face},
  url={https://huggingface.co/CAYTU/whosper-large},
  version={1.0}
}
```

## Acknowledgments
Developed by [Seydou DIALLO](https://www.linkedin.com/in/seydou-diallo-08ab311ba) at [Caytu Robotics](https://caytu.ai)'s AI Department, building on OpenAI's [Whisper-large-v2](https://huggingface.co/openai/whisper-large-v2). Special thanks to the Wolof-speaking community and contributors advancing African language technology.

## Contact US
For any question or support contact us

Email : [email protected]