eddiegulay commited on
Commit
b3c2544
·
1 Parent(s): 7a445e8

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +27 -52
README.md CHANGED
@@ -1,6 +1,6 @@
1
  ---
2
  license: apache-2.0
3
- base_model: facebook/wav2vec2-large-xlsr-53
4
  tags:
5
  - generated_from_trainer
6
  datasets:
@@ -22,67 +22,42 @@ model-index:
22
  metrics:
23
  - name: Wer
24
  type: wer
25
- value: 0.32237526397075045
 
 
26
  ---
27
 
28
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
29
- should probably proofread and complete it, then remove this comment. -->
30
 
31
  # wav2vec2-large-xlsr-mvc-swahili
32
 
33
- This model is a fine-tuned version of [eddiegulay/wav2vec2-large-xlsr-mvc-swahili](https://huggingface.co/eddiegulay/wav2vec2-large-xlsr-mvc-swahili) on the common_voice_13_0 dataset.
34
- It achieves the following results on the evaluation set:
35
- - Loss: inf
36
- - Wer: 0.3224
37
 
38
- ## Model description
39
 
40
- More information needed
 
 
 
41
 
42
- ## Intended uses & limitations
 
 
43
 
44
- More information needed
 
 
 
 
45
 
46
- ## Training and evaluation data
 
47
 
48
- More information needed
 
 
 
49
 
50
- ## Training procedure
51
 
52
- ### Training hyperparameters
53
-
54
- The following hyperparameters were used during training:
55
- - learning_rate: 0.0003
56
- - train_batch_size: 16
57
- - eval_batch_size: 8
58
- - seed: 42
59
- - gradient_accumulation_steps: 2
60
- - total_train_batch_size: 32
61
- - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
62
- - lr_scheduler_type: linear
63
- - lr_scheduler_warmup_steps: 500
64
- - num_epochs: 2
65
-
66
- ### Training results
67
-
68
- | Training Loss | Epoch | Step | Validation Loss | Wer |
69
- |:-------------:|:-----:|:----:|:---------------:|:------:|
70
- | No log | 0.17 | 100 | inf | 1.0 |
71
- | No log | 0.34 | 200 | inf | 1.0 |
72
- | No log | 0.5 | 300 | inf | 0.3420 |
73
- | 3.3446 | 0.67 | 400 | inf | 0.3431 |
74
- | 3.3446 | 0.84 | 500 | inf | 0.3500 |
75
- | 3.3446 | 1.01 | 600 | inf | 0.3433 |
76
- | 3.3446 | 1.17 | 700 | inf | 0.3347 |
77
- | 0.1975 | 1.34 | 800 | inf | 0.3340 |
78
- | 0.1975 | 1.51 | 900 | inf | 0.3307 |
79
- | 0.1975 | 1.68 | 1000 | inf | 0.3233 |
80
- | 0.1975 | 1.84 | 1100 | inf | 0.3224 |
81
-
82
-
83
- ### Framework versions
84
-
85
- - Transformers 4.35.0
86
- - Pytorch 2.1.0
87
- - Datasets 2.14.6
88
- - Tokenizers 0.14.1
 
1
  ---
2
  license: apache-2.0
3
+ base_model: facebook/wav2vec2-large-xlsr-53
4
  tags:
5
  - generated_from_trainer
6
  datasets:
 
22
  metrics:
23
  - name: Wer
24
  type: wer
25
+ value: 0.2
26
+ language:
27
+ - sw
28
  ---
29
 
 
 
30
 
31
  # wav2vec2-large-xlsr-mvc-swahili
32
 
33
+ This model is a finetuned version of facebook/wav2vec2-large-xlsr-53. Following inspiration from [alamsher/wav2vec2-large-xlsr-53-common-voice-s](https://huggingface.co/alamsher/wav2vec2-large-xlsr-53-common-voice-sw)
 
 
 
34
 
35
+ # How to use the model
36
 
37
+ There was an issue with vocab, seems like there are special characters included and they were not considered during training
38
+ You could try
39
+ ```python
40
+ from transformers import AutoProcessor, AutoModelForCTC
41
 
42
+ repo_name = "eddiegulay/wav2vec2-large-xlsr-mvc-swahili"
43
+ processor = AutoProcessor.from_pretrained(repo_name)
44
+ model = AutoModelForCTC.from_pretrained(repo_name)
45
 
46
+ def transcribe(audio_path):
47
+ # Load the audio file
48
+ audio_input, sample_rate = torchaudio.load(audio_path)
49
+ target_sample_rate = 16000
50
+ audio_input = torchaudio.transforms.Resample(orig_freq=sample_rate, new_freq=target_sample_rate)(audio_input)
51
 
52
+ # Preprocess the audio data
53
+ input_dict = processor(audio_input[0], return_tensors="pt", padding=True, sampling_rate=16000)
54
 
55
+ # Perform inference and transcribe
56
+ logits = model(input_dict.input_values.to("cuda")).logits
57
+ pred_ids = torch.argmax(logits, dim=-1)[0]
58
+ transcription = processor.decode(pred_ids)
59
 
60
+ return transcription
61
 
62
+ transcript = transcribe('your_audio.mp3')
63
+ ```