GetmanY1 commited on
Commit
00327ea
1 Parent(s): 3677fe3

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +52 -1
README.md CHANGED
@@ -1,4 +1,55 @@
1
  ---
2
  language:
3
  - fi
4
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  language:
3
  - fi
4
+ ---
5
+
6
+ The best multi-task wav2vec2 model for Finnish from __Getman, Y., Al-Ghezi, R., Gr贸sz, T., Kurimo, M. (2023) Multi-task wav2vec2 Serving as a Pronunciation Training System for Children__ that performs ASR and speech pronunciation rating task simultaneously.
7
+
8
+ ## Usage
9
+
10
+ You must first install [aalto-speech/multitask-wav2vec2](https://github.com/aalto-speech/multitask-wav2vec2) to use this model. The model can then be used directly as follows:
11
+
12
+ ```python
13
+ import torch
14
+ import librosa
15
+ import datasets
16
+ from transformers import Wav2Vec2ForMultiTask, Wav2Vec2Processor
17
+
18
+ def map_to_array(batch):
19
+ speech, _ = librosa.load(batch["file"], sr=16000, mono=True)
20
+ batch["speech"] = speech
21
+ return batch
22
+
23
+ def map_to_pred_multitask(batch):
24
+ device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
25
+ input_values = processor(batch["speech"], sampling_rate=16000, return_tensors="pt", padding="longest").input_values
26
+ with torch.no_grad():
27
+ logits = model(input_values.to(device)).logits
28
+ predicted_ids_ctc = torch.argmax(logits[1], dim=-1)
29
+ transcription = processor.batch_decode(predicted_ids_ctc)
30
+ batch["transcription"] = transcription
31
+ predicted_ids = torch.argmax(logits[0], dim=-1)
32
+ batch['predictions'] = predicted_ids
33
+ return batch
34
+
35
+ processor = Wav2Vec2Processor.from_pretrained(MODEL_PATH)
36
+ model = Wav2Vec2ForMultiTask.from_pretrained(MODEL_PATH)
37
+
38
+ test_dataset = test_dataset.map(map_to_array)
39
+ result = test_dataset.map(map_to_pred_multitask)
40
+ ```
41
+
42
+ ## Citation
43
+
44
+ If you use our models or training scripts, please cite our article as:
45
+
46
+ ```bibtex
47
+ @inproceedings{getman23_slate,
48
+ author={Yaroslav Getman and Ragheb Al-Ghezi and Tam谩s Gr贸sz and Mikko Kurimo},
49
+ title={{Multi-task wav2vec2 Serving as a Pronunciation Training System for Children}},
50
+ year=2023,
51
+ booktitle={Proc. 9th ISCA Workshop on Speech and Language Technology in Education (SLaTE 2023)},
52
+ pages={TODO},
53
+ doi={TODO}
54
+ }
55
+ ```