|
--- |
|
language: "fr" |
|
thumbnail: |
|
tags: |
|
- wav2vec2 |
|
license: "apache-2.0" |
|
--- |
|
|
|
# LeBenchmark: wav2vec2 base model trained on 1K hours of French *female-only* speech |
|
|
|
|
|
LeBenchmark provides an ensemble of pretrained wav2vec2 models on different French datasets containing spontaneous, read, and broadcasted speech. |
|
|
|
For more information about our gender study for SSL moddels, please refer to our paper at: [A Study of Gender Impact in Self-supervised Models for Speech-to-Text Systems](https://arxiv.org/abs/2204.01397) |
|
|
|
|
|
## Model and data descriptions |
|
|
|
We release four gender-specific models trained on 1K hours of speech. |
|
|
|
- [wav2vec2-FR-1K-Male-large](https://huggingface.co/LeBenchmark/wav2vec-FR-1K-Male-large/) |
|
- [wav2vec2-FR-1k-Male-base](https://huggingface.co/LeBenchmark/wav2vec-FR-1K-Male-base/) |
|
- [wav2vec2-FR-1K-Female-large](https://huggingface.co/LeBenchmark/wav2vec-FR-1K-Female-large/) |
|
- [wav2vec2-FR-1K-Female-base](https://huggingface.co/LeBenchmark/wav2vec-FR-1K-Female-base/) |
|
|
|
## Intended uses & limitations |
|
|
|
Pretrained wav2vec2 models are distributed under the Apache-2.0 license. Hence, they can be reused extensively without strict limitations. However, benchmarks and data may be linked to corpora that are not completely open-sourced. |
|
|
|
## Referencing our gender-specific models |
|
``` |
|
@inproceedings{boito22_interspeech, |
|
author={Marcely Zanon Boito and Laurent Besacier and Natalia Tomashenko and Yannick Estève}, |
|
title={{A Study of Gender Impact in Self-supervised Models for Speech-to-Text Systems}}, |
|
year=2022, |
|
booktitle={Proc. Interspeech 2022}, |
|
pages={1278--1282}, |
|
doi={10.21437/Interspeech.2022-353} |
|
} |
|
``` |
|
## Referencing LeBenchmark |
|
|
|
``` |
|
@inproceedings{evain2021task, |
|
title={Task agnostic and task specific self-supervised learning from speech with \textit{LeBenchmark}}, |
|
author={Evain, Sol{\`e}ne and Nguyen, Ha and Le, Hang and Boito, Marcely Zanon and Mdhaffar, Salima and Alisamir, Sina and Tong, Ziyi and Tomashenko, Natalia and Dinarelli, Marco and Parcollet, Titouan and others}, |
|
booktitle={Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2)}, |
|
year={2021} |
|
} |
|
``` |