Update README.md
Browse files
README.md
CHANGED
@@ -26,7 +26,7 @@ model-index:
|
|
26 |
metrics:
|
27 |
- name: Test WER
|
28 |
type: wer
|
29 |
-
value: 23.45
|
30 |
- name: Test CER
|
31 |
type: cer
|
32 |
value: 11.01
|
@@ -35,79 +35,76 @@ pipeline_tag: automatic-speech-recognition
|
|
35 |
|
36 |
# Whosper-large-v3
|
37 |
|
38 |
-
<!-- ---
|
39 |
-
library_name: peft
|
40 |
-
license: apache-2.0
|
41 |
-
base_model: openai/whisper-large-v2
|
42 |
-
tags:
|
43 |
-
- generated_from_trainer
|
44 |
-
- wolof-asr
|
45 |
-
- bilingual
|
46 |
-
--- -->
|
47 |
-
|
48 |
## Model Overview
|
49 |
-
Whosper-large-v3 is a
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
50 |
|
51 |
## Performance Metrics
|
52 |
-
- **
|
53 |
-
- **
|
54 |
-
- **
|
|
|
|
|
55 |
|
56 |
-
|
57 |
|
58 |
| Metric | Whosper-large-v3 | Whosper-large | Improvement |
|
59 |
-
|
60 |
-
| WER
|
61 |
-
| CER
|
62 |
|
63 |
## Key Features
|
64 |
-
- Improved WER and CER compared to whosper-large
|
65 |
- Optimized for Wolof and French recognition
|
66 |
- Enhanced performance on bilingual content
|
67 |
|
68 |
## Limitations
|
69 |
- Reduced performance on English compared to whosper-large
|
70 |
-
- Less effective for general multilingual content
|
71 |
|
72 |
## Training Data
|
73 |
-
|
74 |
-
- ALFFA Public Dataset
|
75 |
-
- FLEURS Dataset
|
76 |
-
- Bus Urbain Dataset
|
77 |
-
- Anta Women TTS Dataset
|
78 |
-
- Kallama Dataset
|
79 |
-
|
80 |
|
|
|
|
|
|
|
|
|
|
|
81 |
|
|
|
|
|
|
|
|
|
|
|
82 |
|
83 |
-
##
|
84 |
|
|
|
85 |
```bash
|
86 |
-
|
87 |
pip install git+https://github.com/sudoping01/[email protected]
|
88 |
-
|
89 |
```
|
90 |
|
91 |
-
###
|
92 |
-
|
93 |
```python
|
94 |
-
|
95 |
from whosper import WhosperTranscriber
|
96 |
|
97 |
# Initialize the transcriber
|
98 |
-
|
99 |
-
transcriber = WhosperTranscriber(model_id = "sudoping01/whosper-large-v3")
|
100 |
|
101 |
# Transcribe an audio file
|
102 |
-
|
103 |
result = transcriber.transcribe_audio("path/to/your/audio.wav")
|
104 |
-
|
105 |
print(result)
|
106 |
-
|
107 |
```
|
108 |
|
109 |
-
## Training Procedure
|
110 |
-
|
|
|
111 |
```yaml
|
112 |
learning_rate: 0.001
|
113 |
train_batch_size: 8
|
@@ -120,11 +117,11 @@ lr_scheduler_type: linear
|
|
120 |
lr_scheduler_warmup_steps: 50
|
121 |
num_epochs: 6
|
122 |
mixed_precision_training: Native AMP
|
123 |
-
```
|
124 |
|
125 |
### Training Results
|
126 |
| Training Loss | Epoch | Step | Validation Loss |
|
127 |
-
|
128 |
| 0.7575 | 0.9998 | 2354 | 0.7068 |
|
129 |
| 0.6429 | 1.9998 | 4708 | 0.6073 |
|
130 |
| 0.5468 | 2.9998 | 7062 | 0.5428 |
|
@@ -139,13 +136,20 @@ mixed_precision_training: Native AMP
|
|
139 |
- Datasets: 3.2.0
|
140 |
- Tokenizers: 0.21.0
|
141 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
142 |
## License
|
143 |
-
|
144 |
|
145 |
## Citation
|
146 |
```bibtex
|
147 |
@misc{whosper2025,
|
148 |
-
title={Whosper-large-v3:
|
149 |
author={Seydou DIALLO},
|
150 |
year={2025},
|
151 |
publisher={Caytu Robotics}
|
@@ -153,4 +157,7 @@ MIT
|
|
153 |
```
|
154 |
|
155 |
## Acknowledgments
|
156 |
-
|
|
|
|
|
|
|
|
26 |
metrics:
|
27 |
- name: Test WER
|
28 |
type: wer
|
29 |
+
value: 23.45
|
30 |
- name: Test CER
|
31 |
type: cer
|
32 |
value: 11.01
|
|
|
35 |
|
36 |
# Whosper-large-v3
|
37 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
38 |
## Model Overview
|
39 |
+
Whosper-large-v3 is a cutting-edge speech recognition model tailored for Wolof, Senegal's primary language. Built on OpenAI's Whisper-large-v2 [https://huggingface.co/openai/whisper-large-v2], it advances African language processing with notable improvements in Word Error Rate (WER) and Character Error Rate (CER). Whether you're transcribing conversations, building language learning tools, or conducting research, this model is designed for researchers, developers, and students working with Wolof speech data.
|
40 |
+
|
41 |
+
### Key Strengths
|
42 |
+
- **Superior Code-Switching**: Handles natural Wolof-French/English mixing, mirroring real-world speech patterns
|
43 |
+
- **Multilingual**: Performs well in French and English in addition to Wolof
|
44 |
+
- **Production-Ready**: Thoroughly tested and optimized for deployment
|
45 |
+
- **Open Source**: Released under the apache-2.0 [https://www.apache.org/licenses/LICENSE-2.0] license, perfect for research and development
|
46 |
+
- **African NLP Focus**: Contributing to the broader goal of comprehensive African language support
|
47 |
|
48 |
## Performance Metrics
|
49 |
+
- **WER**: 0.2345
|
50 |
+
- **CER**: 0.1101
|
51 |
+
- **Loss**: 0.4490
|
52 |
+
|
53 |
+
Lower values mean better accuracy—ideal for practical applications!
|
54 |
|
55 |
+
### Performance Comparison
|
56 |
|
57 |
| Metric | Whosper-large-v3 | Whosper-large | Improvement |
|
58 |
+
|--------|------------------|---------------|-------------|
|
59 |
+
| WER | 0.2345 | 0.2423 | 3.2% better |
|
60 |
+
| CER | 0.1101 | 0.1135 | 3.0% better |
|
61 |
|
62 |
## Key Features
|
63 |
+
- Improved WER and CER compared to whosper-large [https://huggingface.co/sudoping01/whosper-large]
|
64 |
- Optimized for Wolof and French recognition
|
65 |
- Enhanced performance on bilingual content
|
66 |
|
67 |
## Limitations
|
68 |
- Reduced performance on English compared to whosper-large
|
69 |
+
- Less effective for general multilingual content compared to whosper-large
|
70 |
|
71 |
## Training Data
|
72 |
+
Trained on diverse Wolof speech data:
|
|
|
|
|
|
|
|
|
|
|
|
|
73 |
|
74 |
+
- **ALFFA Public Dataset**
|
75 |
+
- **FLEURS Dataset**
|
76 |
+
- **Bus Urbain Dataset**
|
77 |
+
- **Anta Women TTS Dataset**
|
78 |
+
- **Kallama Dataset**
|
79 |
|
80 |
+
This diversity ensures the model excels across:
|
81 |
+
- Speaking styles and dialects
|
82 |
+
- Code-switching patterns
|
83 |
+
- Gender and age groups
|
84 |
+
- Recording conditions
|
85 |
|
86 |
+
## Quick Start Guide
|
87 |
|
88 |
+
### Installation
|
89 |
```bash
|
|
|
90 |
pip install git+https://github.com/sudoping01/[email protected]
|
|
|
91 |
```
|
92 |
|
93 |
+
### Basic Usage
|
|
|
94 |
```python
|
|
|
95 |
from whosper import WhosperTranscriber
|
96 |
|
97 |
# Initialize the transcriber
|
98 |
+
transcriber = WhosperTranscriber(model_id="sudoping01/whosper-large-v3")
|
|
|
99 |
|
100 |
# Transcribe an audio file
|
|
|
101 |
result = transcriber.transcribe_audio("path/to/your/audio.wav")
|
|
|
102 |
print(result)
|
|
|
103 |
```
|
104 |
|
105 |
+
<!-- ## Training Procedure -->
|
106 |
+
|
107 |
+
<!-- ### Training Hyperparameters
|
108 |
```yaml
|
109 |
learning_rate: 0.001
|
110 |
train_batch_size: 8
|
|
|
117 |
lr_scheduler_warmup_steps: 50
|
118 |
num_epochs: 6
|
119 |
mixed_precision_training: Native AMP
|
120 |
+
``` -->
|
121 |
|
122 |
### Training Results
|
123 |
| Training Loss | Epoch | Step | Validation Loss |
|
124 |
+
|---------------|-------|------|-----------------|
|
125 |
| 0.7575 | 0.9998 | 2354 | 0.7068 |
|
126 |
| 0.6429 | 1.9998 | 4708 | 0.6073 |
|
127 |
| 0.5468 | 2.9998 | 7062 | 0.5428 |
|
|
|
136 |
- Datasets: 3.2.0
|
137 |
- Tokenizers: 0.21.0
|
138 |
|
139 |
+
## Contributing to African NLP
|
140 |
+
Whosper-large-v3 is a step toward robust African language support. Join us by:
|
141 |
+
- Reporting issues or suggesting features on GitHub [https://github.com/sudoping01/whosper]
|
142 |
+
- Adding Wolof speech data to enhance the model
|
143 |
+
- Translating documentation into Wolof
|
144 |
+
- Using it in research or education
|
145 |
+
|
146 |
## License
|
147 |
+
Apache-2.0 [https://www.apache.org/licenses/LICENSE-2.0]
|
148 |
|
149 |
## Citation
|
150 |
```bibtex
|
151 |
@misc{whosper2025,
|
152 |
+
title={Whosper-large-v3: A Multilingual ASR Model for Wolof, French and English with Enhanced Code-Switching Capabilities},
|
153 |
author={Seydou DIALLO},
|
154 |
year={2025},
|
155 |
publisher={Caytu Robotics}
|
|
|
157 |
```
|
158 |
|
159 |
## Acknowledgments
|
160 |
+
Developed by Seydou DIALLO at Caytu Robotics' AI Department, building on OpenAI's Whisper-large-v2 [https://huggingface.co/openai/whisper-large-v2]. Special thanks to the Wolof-speaking community and contributors advancing African language technology.
|
161 |
+
|
162 |
+
## Try It Now!
|
163 |
+
Ready to transcribe Wolof audio with top-tier accuracy? Download Whosper-large-v3 and join the movement to advance African language technology!
|