File size: 4,151 Bytes
ac52ee7 75f6f4f ac52ee7 79adb43 e3eba5c 79adb43 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 |
---
library_name: transformers
base_model: openai/whisper-tiny
tags:
- generated_from_trainer
datasets:
- common_voice_11_0
model-index:
- name: whisper-fa-tinyyy
results: []
license: mit
language:
- fa
metrics:
- wer
pipeline_tag: automatic-speech-recognition
---
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->
# whisper-fa-tinyyy
This model is a fine-tuned version of [openai/whisper-tiny](https://huggingface.co/openai/whisper-tiny) on the common_voice_11_0 dataset.
It achieves the following results on the evaluation set:
- Loss: 0.0246
## Model description
More information needed
## Intended uses & limitations
More information needed
## Training and evaluation data
More information needed
## Training procedure
### Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0001
- train_batch_size: 8
- eval_batch_size: 8
- seed: 42
- gradient_accumulation_steps: 4
- total_train_batch_size: 32
- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 500
- num_epochs: 1
- mixed_precision_training: Native AMP
### Training results
| Training Loss | Epoch | Step | Validation Loss |
|:-------------:|:------:|:----:|:---------------:|
| 0.0186 | 0.9998 | 2357 | 0.0246 |
### Framework versions
- Transformers 4.49.0
- Pytorch 2.6.0+cu124
- Datasets 3.4.1
- Tokenizers 0.21.1
## how to use the model in colab:
# Install required packages
!pip install torch torchaudio transformers pydub google-colab
import torch
from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor, pipeline
from pydub import AudioSegment
import os
from google.colab import files
# Load the model and processor
model_id = "hackergeek98/whisper-fa-tinyyy"
device = "cuda" if torch.cuda.is_available() else "cpu"
model = AutoModelForSpeechSeq2Seq.from_pretrained(model_id).to(device)
processor = AutoProcessor.from_pretrained(model_id)
# Create pipeline
whisper_pipe = pipeline(
"automatic-speech-recognition", model=model, tokenizer=processor.tokenizer, feature_extractor=processor.feature_extractor, device=0 if torch.cuda.is_available() else -1
)
# Convert audio to WAV format
def convert_to_wav(audio_path):
audio = AudioSegment.from_file(audio_path)
wav_path = "converted_audio.wav"
audio.export(wav_path, format="wav")
return wav_path
# Split long audio into chunks
def split_audio(audio_path, chunk_length_ms=30000): # Default: 30 sec per chunk
audio = AudioSegment.from_wav(audio_path)
chunks = [audio[i:i+chunk_length_ms] for i in range(0, len(audio), chunk_length_ms)]
chunk_paths = []
for i, chunk in enumerate(chunks):
chunk_path = f"chunk_{i}.wav"
chunk.export(chunk_path, format="wav")
chunk_paths.append(chunk_path)
return chunk_paths
# Transcribe a long audio file
def transcribe_long_audio(audio_path):
wav_path = convert_to_wav(audio_path)
chunk_paths = split_audio(wav_path)
transcription = ""
for chunk in chunk_paths:
result = whisper_pipe(chunk)
transcription += result["text"] + "\n"
os.remove(chunk) # Remove processed chunk
os.remove(wav_path) # Cleanup original file
# Save transcription to a text file
text_path = "transcription.txt"
with open(text_path, "w") as f:
f.write(transcription)
return text_path
# Upload and process audio in Colab
uploaded = files.upload()
audio_file = list(uploaded.keys())[0]
transcription_file = transcribe_long_audio(audio_file)
# Download the transcription file
files.download(transcription_file) |