|
--- |
|
library_name: peft |
|
base_model: openai/whisper-large-v2 |
|
datasets: |
|
- mozilla-foundation/common_voice_16_0 |
|
language: |
|
- ja |
|
metrics: |
|
- wer |
|
--- |
|
|
|
# Model Card for Model ID |
|
|
|
Japanese transcription, testing in progress to see results, main personal use cases are japanese comedy |
|
|
|
usage 9GB vram with this Lora |
|
|
|
## Model Details |
|
|
|
### Model Description |
|
|
|
openai-whisper-large-v2-LORA-ja |
|
|
|
|
|
- **Developed by:** FZNX |
|
- **Model type:** PEFT LORA |
|
- **Language(s) (NLP):** Fine tune Japanese on whisper common 16 |
|
- **License:** [More Information Needed] |
|
- **Finetuned from model [optional]:** Whisper Large V2 |
|
|
|
|
|
## How to Get Started with the Model |
|
|
|
import torch |
|
from transformers import ( |
|
AutomaticSpeechRecognitionPipeline, |
|
WhisperForConditionalGeneration, |
|
WhisperTokenizer, |
|
WhisperProcessor, |
|
) |
|
from peft import PeftModel, PeftConfig |
|
|
|
peft_model_id = "fznx92/openai-whisper-large-v2-ja-transcribe-colab" |
|
sample = "insert mp3 file location here" |
|
|
|
language = "japanese" |
|
task = "transcribe" |
|
|
|
peft_config = PeftConfig.from_pretrained(peft_model_id) |
|
model = WhisperForConditionalGeneration.from_pretrained( |
|
peft_config.base_model_name_or_path, |
|
) |
|
model = PeftModel.from_pretrained(model, peft_model_id) |
|
model.to("cuda").half() |
|
|
|
processor = WhisperProcessor.from_pretrained(peft_config.base_model_name_or_path, language=language, task=task) |
|
|
|
pipe = AutomaticSpeechRecognitionPipeline(model=model, tokenizer=processor.tokenizer, feature_extractor=processor.feature_extractor, batch_size=8, torch_dtype=torch.float16, device="cuda:0") |
|
|
|
def transcribe(audio, return_timestamps=False): |
|
text = pipe(audio, chunk_length_s=30, return_timestamps=return_timestamps, generate_kwargs={"language": language, "task": task})["text"] |
|
return text |
|
|
|
transcript = transcribe(sample) |
|
print(transcript) |
|
|
|
### Training Data |
|
|
|
Common Voice 16 dataset |
|
|
|
### Training Procedure |
|
|
|
via Google Colab T5 @ 6 hours |
|
|
|
## Evaluation |
|
|
|
<!-- This section describes the evaluation protocols and provides the results. --> |
|
|
|
|
|
### Framework versions |
|
|
|
- PEFT 0.7.1 |