eddiegulay
/

wav2vec2-large-xlsr-mvc-swahili

@@ -1,6 +1,6 @@
 ---
 license: apache-2.0
-base_model:  facebook/wav2vec2-large-xlsr-53
 tags:
 - generated_from_trainer
 datasets:
@@ -22,67 +22,42 @@ model-index:
     metrics:
     - name: Wer
       type: wer
-      value: 0.32237526397075045
 ---
-<!-- This model card has been generated automatically according to the information the Trainer had access to. You
-should probably proofread and complete it, then remove this comment. -->
 # wav2vec2-large-xlsr-mvc-swahili
-This model is a fine-tuned version of [eddiegulay/wav2vec2-large-xlsr-mvc-swahili](https://huggingface.co/eddiegulay/wav2vec2-large-xlsr-mvc-swahili) on the common_voice_13_0 dataset.
-It achieves the following results on the evaluation set:
-- Loss: inf
-- Wer: 0.3224
-## Model description
-More information needed
-## Intended uses & limitations
-More information needed
-## Training and evaluation data
-More information needed
-## Training procedure
-### Training hyperparameters
-The following hyperparameters were used during training:
-- learning_rate: 0.0003
-- train_batch_size: 16
-- eval_batch_size: 8
-- seed: 42
-- gradient_accumulation_steps: 2
-- total_train_batch_size: 32
-- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
-- lr_scheduler_type: linear
-- lr_scheduler_warmup_steps: 500
-- num_epochs: 2
-### Training results
-| Training Loss | Epoch | Step | Validation Loss | Wer    |
-|:-------------:|:-----:|:----:|:---------------:|:------:|
-| No log        | 0.17  | 100  | inf             | 1.0    |
-| No log        | 0.34  | 200  | inf             | 1.0    |
-| No log        | 0.5   | 300  | inf             | 0.3420 |
-| 3.3446        | 0.67  | 400  | inf             | 0.3431 |
-| 3.3446        | 0.84  | 500  | inf             | 0.3500 |
-| 3.3446        | 1.01  | 600  | inf             | 0.3433 |
-| 3.3446        | 1.17  | 700  | inf             | 0.3347 |
-| 0.1975        | 1.34  | 800  | inf             | 0.3340 |
-| 0.1975        | 1.51  | 900  | inf             | 0.3307 |
-| 0.1975        | 1.68  | 1000 | inf             | 0.3233 |
-| 0.1975        | 1.84  | 1100 | inf             | 0.3224 |
-### Framework versions
-- Transformers 4.35.0
-- Pytorch 2.1.0
-- Datasets 2.14.6
-- Tokenizers 0.14.1

 ---
 license: apache-2.0
+base_model: facebook/wav2vec2-large-xlsr-53
 tags:
 - generated_from_trainer
 datasets:
     metrics:
     - name: Wer
       type: wer
+      value: 0.2
+language:
+- sw
 ---
 # wav2vec2-large-xlsr-mvc-swahili
+This model is a finetuned version of facebook/wav2vec2-large-xlsr-53. Following inspiration from [alamsher/wav2vec2-large-xlsr-53-common-voice-s](https://huggingface.co/alamsher/wav2vec2-large-xlsr-53-common-voice-sw)
+# How to use the model
+There was an issue with vocab, seems like there are special characters included and they were not considered during training
+You could try
+```python
+from transformers import AutoProcessor, AutoModelForCTC
+repo_name = "eddiegulay/wav2vec2-large-xlsr-mvc-swahili"
+processor = AutoProcessor.from_pretrained(repo_name)
+model = AutoModelForCTC.from_pretrained(repo_name)
+def transcribe(audio_path):
+  # Load the audio file
+  audio_input, sample_rate = torchaudio.load(audio_path)
+  target_sample_rate = 16000
+  audio_input = torchaudio.transforms.Resample(orig_freq=sample_rate, new_freq=target_sample_rate)(audio_input)
+  # Preprocess the audio data
+  input_dict = processor(audio_input[0], return_tensors="pt", padding=True, sampling_rate=16000)
+  # Perform inference and transcribe
+  logits = model(input_dict.input_values.to("cuda")).logits
+  pred_ids = torch.argmax(logits, dim=-1)[0]
+  transcription = processor.decode(pred_ids)
+  return transcription
+transcript = transcribe('your_audio.mp3')
+```