Transformers
PyTorch
wav2vec2
pretraining
speech
xls_r
xls_r_pretrained
aconneau commited on
Commit
d2d08b5
·
1 Parent(s): 91461fd

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -2
README.md CHANGED
@@ -11,14 +11,15 @@ license: apache-2.0
11
 
12
  # Wav2Vec2-XLS-R-300M
13
 
 
 
 
14
  [Facebook's Wav2Vec2 XLS-R](https://ai.facebook.com/blog/wav2vec-20-learning-the-structure-of-speech-from-raw-audio/)
15
 
16
  XLS-R is Facebook AI's large-scale multilingual pretrained model for speech (the "XLM-R for Speech"). It is pretrained on 436k hours of unlabeled speech, including VoxPopuli, MLS, CommonVoice, BABEL and VoxLingua107. Is uses the wav2vec 2.0 objective, in 128 languages. When using the model make sure that your speech input is sampled at 16Khz. Note that this model should be fine-tuned on a downstream task, like Automatic Speech Recognition, Translation or Classification. Check out [this blog](https://huggingface.co/blog/fine-tune-wav2vec2-english) for more information about ASR.
17
 
18
  [XLS-R Paper](https://arxiv.org/abs/)
19
 
20
- ![model image](https://raw.githubusercontent.com/patrickvonplaten/scientific_images/master/xls_r.png)
21
-
22
  Authors: Arun Babu, Changhan Wang, Andros Tjandra, Kushal Lakhotia, Qiantong Xu, Naman Goyal, Kritika Singh, Patrick von Platen, Yatharth Saraf, Juan Pino, Alexei Baevski, Alexis Conneau, Michael Auli
23
 
24
  **Abstract**
 
11
 
12
  # Wav2Vec2-XLS-R-300M
13
 
14
+ ![model image](https://raw.githubusercontent.com/patrickvonplaten/scientific_images/master/xls_r.png)
15
+
16
+
17
  [Facebook's Wav2Vec2 XLS-R](https://ai.facebook.com/blog/wav2vec-20-learning-the-structure-of-speech-from-raw-audio/)
18
 
19
  XLS-R is Facebook AI's large-scale multilingual pretrained model for speech (the "XLM-R for Speech"). It is pretrained on 436k hours of unlabeled speech, including VoxPopuli, MLS, CommonVoice, BABEL and VoxLingua107. Is uses the wav2vec 2.0 objective, in 128 languages. When using the model make sure that your speech input is sampled at 16Khz. Note that this model should be fine-tuned on a downstream task, like Automatic Speech Recognition, Translation or Classification. Check out [this blog](https://huggingface.co/blog/fine-tune-wav2vec2-english) for more information about ASR.
20
 
21
  [XLS-R Paper](https://arxiv.org/abs/)
22
 
 
 
23
  Authors: Arun Babu, Changhan Wang, Andros Tjandra, Kushal Lakhotia, Qiantong Xu, Naman Goyal, Kritika Singh, Patrick von Platen, Yatharth Saraf, Juan Pino, Alexei Baevski, Alexis Conneau, Michael Auli
24
 
25
  **Abstract**