|
--- |
|
language: |
|
- fi |
|
tags: |
|
- multi-task |
|
--- |
|
|
|
The best multi-task wav2vec 2.0 model for Finnish from [__Getman, Y., Al-Ghezi, R., Grósz, T., Kurimo, M. (2023) Multi-task wav2vec2 Serving as a Pronunciation Training System for Children__](https://www.isca-speech.org/archive/slate_2023/getman23_slate.html) that performs ASR and speech pronunciation rating task simultaneously. |
|
|
|
## Usage |
|
|
|
You must first install [aalto-speech/multitask-wav2vec2](https://github.com/aalto-speech/multitask-wav2vec2) to use this model. The model can then be used directly as follows: |
|
|
|
```python |
|
import torch |
|
import librosa |
|
import datasets |
|
from transformers import Wav2Vec2ForMultiTask, Wav2Vec2Processor |
|
|
|
def map_to_array(batch): |
|
speech, _ = librosa.load(batch["file"], sr=16000, mono=True) |
|
batch["speech"] = speech |
|
return batch |
|
|
|
def map_to_pred_multitask(batch): |
|
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu') |
|
input_values = processor(batch["speech"], sampling_rate=16000, return_tensors="pt", padding="longest").input_values |
|
with torch.no_grad(): |
|
logits = model(input_values.to(device)).logits |
|
predicted_ids_ctc = torch.argmax(logits[1], dim=-1) |
|
transcription = processor.batch_decode(predicted_ids_ctc) |
|
batch["transcription"] = transcription |
|
predicted_ids = torch.argmax(logits[0], dim=-1) |
|
batch['predictions'] = predicted_ids |
|
return batch |
|
|
|
processor = Wav2Vec2Processor.from_pretrained(MODEL_PATH) |
|
model = Wav2Vec2ForMultiTask.from_pretrained(MODEL_PATH) |
|
|
|
test_dataset = test_dataset.map(map_to_array) |
|
result = test_dataset.map(map_to_pred_multitask) |
|
``` |
|
|
|
## Citation |
|
|
|
If you use our models or training scripts, please cite our article as: |
|
|
|
```bibtex |
|
@inproceedings{getman23_slate, |
|
author={Yaroslav Getman and Ragheb Al-Ghezi and Tamas Grosz and Mikko Kurimo}, |
|
title={{Multi-task wav2vec2 Serving as a Pronunciation Training System for Children}}, |
|
year=2023, |
|
booktitle={Proc. 9th Workshop on Speech and Language Technology in Education (SLaTE)}, |
|
pages={36--40}, |
|
doi={10.21437/SLaTE.2023-8} |
|
} |
|
``` |