fznx92
/

openai-whisper-large-v2-ja-transcribe-colab

Model card Files Files and versions Community

openai-whisper-large-v2-ja-transcribe-colab / README.md

fznx92's picture

Update README.md

ab7e403 over 1 year ago

|

history blame contribute delete

2.03 kB

	---
	library_name: peft
	base_model: openai/whisper-large-v2
	datasets:
	- mozilla-foundation/common_voice_16_0
	language:
	- ja
	metrics:
	- wer
	---

	# Model Card for Model ID

	Japanese transcription, testing in progress to see results, main personal use cases are japanese comedy

	usage 9GB vram with this Lora

	## Model Details

	### Model Description

	openai-whisper-large-v2-LORA-ja


	- Developed by: FZNX
	- Model type: PEFT LORA
	- Language(s) (NLP): Fine tune Japanese on whisper common 16
	- License: [More Information Needed]
	- Finetuned from model [optional]: Whisper Large V2


	## How to Get Started with the Model

	import torch
	from transformers import (
	AutomaticSpeechRecognitionPipeline,
	WhisperForConditionalGeneration,
	WhisperTokenizer,
	WhisperProcessor,
	)
	from peft import PeftModel, PeftConfig

	peft_model_id = "fznx92/openai-whisper-large-v2-ja-transcribe-colab"
	sample = "insert mp3 file location here"

	language = "japanese"
	task = "transcribe"

	peft_config = PeftConfig.from_pretrained(peft_model_id)
	model = WhisperForConditionalGeneration.from_pretrained(
	peft_config.base_model_name_or_path,
	)
	model = PeftModel.from_pretrained(model, peft_model_id)
	model.to("cuda").half()

	processor = WhisperProcessor.from_pretrained(peft_config.base_model_name_or_path, language=language, task=task)

	pipe = AutomaticSpeechRecognitionPipeline(model=model, tokenizer=processor.tokenizer, feature_extractor=processor.feature_extractor, batch_size=8, torch_dtype=torch.float16, device="cuda:0")

	def transcribe(audio, return_timestamps=False):
	text = pipe(audio, chunk_length_s=30, return_timestamps=return_timestamps, generate_kwargs={"language": language, "task": task})["text"]
	return text

	transcript = transcribe(sample)
	print(transcript)

	### Training Data

	Common Voice 16 dataset

	### Training Procedure

	via Google Colab T5 @ 6 hours

	## Evaluation

	<!-- This section describes the evaluation protocols and provides the results. -->


	### Framework versions

	- PEFT 0.7.1