anonymoussubmitter222 commited on
Commit
beabbe4
·
1 Parent(s): d081411

new readme

Browse files
Files changed (1) hide show
  1. README.md +39 -17
README.md CHANGED
@@ -1,21 +1,43 @@
1
  ---
2
- license: apache-2.0
 
 
 
 
 
 
 
 
3
  ---
4
- # Tunisian Arabic ASR Model with wav2vec2 and code switching
5
- This repository provides all the necessary tools to perform automatic speech recognition from an end-to-end system pretrained on Tunisian arabic dialect. This model utilizes a code_switching approach and can process english , french and tunisian arabic
6
- ## Performance
7
- the performance of the mode is :
8
- | Release Version |WER (%) | CER (%) |
9
- |-----------------|---------|---------|
10
- | v1.0 |29.47 | 12.44 |
11
- ## Pipeline
12
- The architecture comprises three components:
13
- * French ASR pretrained with wav2vec2 on french corporas
14
- * English ASR pretrained with wav2vec2 on english corporas
15
- * Custom Tunisian ASR pretrained using wav2vec on a tunisian arabic corpora
16
- All three models will process the audio data. Subsequently, the resulting posteriorgrams will be combined and utilized as input for the Mixer, which will produce the final posteriorgrams.
17
- ## Install
18
- ```python
19
- pip install speechbrain transformers
 
 
 
 
20
  ```
 
 
 
 
 
 
 
 
 
 
21
 
 
1
  ---
2
+ title: Tunisian Asr
3
+ emoji: 🐠
4
+ colorFrom: pink
5
+ colorTo: yellow
6
+ sdk: gradio
7
+ sdk_version: 3.16.1
8
+ app_file: app.py
9
+ pinned: false
10
+ license: cc-by-nc-3.0
11
  ---
12
+ # Global description
13
+
14
+ This is a speechbrain-based Automatic Speech Recognition (ASR) model for Tunisian arabic. It outputs tunisian transcriptions in arabic language. Since the language is unwritten, the transcriptions may vary. This model is the work of Salah Zaiem, PhD candidate, contact : [email protected]
15
+
16
+
17
+ # Pipeline description
18
+ This ASR system is composed of 2 different but linked blocks:
19
+ - Acoustic model (wavlm-large + CTC). A pretrained wavlm-larhe model (https://huggingface.co/microsoft/wavlm-large) is combined with two DNN layers and finetuned on a tunisian arabic dataset.
20
+ - KenLM based 4-gram language model, learned on the training data.
21
+ The obtained final acoustic representation is given to the CTC greedy decoder.
22
+ The system is trained with single channel recordings resampled at 16 khz. (The model should be good with audio resampled from 8khz)
23
+
24
+ #Limitations
25
+ Due to the nature of the available training data, the model may encounter issues when dealing with foreign words. So while it is common for Tunisian speakers to use (mainly french) foreign words, these will lead to more errors, we are working on improving this in further models.
26
+
27
+ Run is done on CPU to keep it free in this space. This leads to quite long running times on long sequences. If for your project or research, you want to transcribe long sequences, feel free to drop an email here : [email protected]
28
+
29
+ # Referencing SpeechBrain
30
+
31
+ This work has no published paper yet, and may never have. If you use it in an academic setting, please cite the original SpeechBrain paper :
32
  ```
33
+ @misc{SB2021,
34
+ author = {Ravanelli, Mirco and Parcollet, Titouan and Rouhe, Aku and Plantinga, Peter and Rastorgueva, Elena and Lugosch, Loren and Dawalatabad, Nauman and Ju-Chieh, Chou and Heba, Abdel and Grondin, Francois and Aris, William and Liao, Chien-Feng and Cornell, Samuele and Yeh, Sung-Lin and Na, Hwidong and Gao, Yan and Fu, Szu-Wei and Subakan, Cem and De Mori, Renato and Bengio, Yoshua },
35
+ title = {SpeechBrain},
36
+ year = {2021},
37
+ publisher = {GitHub},
38
+ journal = {GitHub repository},
39
+ howpublished = {\\\\url{https://github.com/speechbrain/speechbrain}},
40
+ }
41
+ ```
42
+
43