IbrahimSalah commited on
Commit
c4ef99f
·
1 Parent(s): 8b5f138

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +54 -0
README.md ADDED
@@ -0,0 +1,54 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Arabic syllables recognition with tashkeel.
2
+ This is fine tuned wav2vec2 model to recognize arabic syllables from speech.
3
+ The model was trained on Modern standard arabic dataset.\
4
+ 5-gram language model is available with the model.
5
+
6
+ To try it out :
7
+
8
+ ```
9
+ !pip install datasets transformers
10
+ !pip install https://github.com/kpu/kenlm/archive/master.zip pyctcdecode
11
+ ```
12
+
13
+ ```
14
+ from transformers import Wav2Vec2Processor, Wav2Vec2ForCTC
15
+ from transformers import Wav2Vec2ProcessorWithLM
16
+ processor = Wav2Vec2ProcessorWithLM.from_pretrained('IbrahimSalah/Syllables_final_Large')
17
+ model = Wav2Vec2ForCTC.from_pretrained("IbrahimSalah/Syllables_final_Large")
18
+ ```
19
+ ```
20
+ import pandas as pd
21
+ dftest = pd.DataFrame(columns=['audio'])
22
+ import datasets
23
+ from datasets import Dataset
24
+ path ='/content/908-33.wav'
25
+ dftest['audio']=[path] ## audio path
26
+ dataset = Dataset.from_pandas(dftest)
27
+ ```
28
+ ```
29
+ import torch
30
+ import torchaudio
31
+ def speech_file_to_array_fn(batch):
32
+ speech_array, sampling_rate = torchaudio.load(batch["audio"])
33
+ print(sampling_rate)
34
+ resampler = torchaudio.transforms.Resample(sampling_rate, 16_000) # The original data was with 48,000 sampling rate. You can change it according to your input.
35
+ batch["audio"] = resampler(speech_array).squeeze().numpy()
36
+ return batch
37
+ ```
38
+ ```
39
+ import numpy as np
40
+ from datasets import load_dataset
41
+ from transformers import Wav2Vec2ForCTC, Wav2Vec2Processor
42
+ test_dataset = dataset.map(speech_file_to_array_fn)
43
+ inputs = processor(test_dataset["audio"], sampling_rate=16_000, return_tensors="pt", padding=True)
44
+ with torch.no_grad():
45
+ logits = model(inputs.input_values).logits
46
+ print(logits.numpy().shape)
47
+
48
+ transcription = processor.batch_decode(logits.numpy()).text
49
+ print("Prediction:",transcription[0])
50
+ ```
51
+
52
+
53
+
54
+