anzorq
/

openai-whisper-large-v2-LORA-colab

Model card Files Files and versions Community

openai-whisper-large-v2-LORA-colab / README.md

anzorq's picture

Update README.md

50cd289 over 1 year ago

|

history blame contribute delete

2.21 kB

	---
	library_name: peft
	base_model: openai/whisper-large-v2
	---

	Use language="Georgian" for inference.

	# Inference
	```Python
	import torch
	import gradio as gr
	from transformers import (
	AutomaticSpeechRecognitionPipeline,
	WhisperForConditionalGeneration,
	WhisperTokenizer,
	WhisperProcessor,
	)
	from peft import PeftModel, PeftConfig
	from pytube import YouTube

	peft_model_id = "anzorq/openai-whisper-large-v2-LORA-colab"
	# peft_model_id = "/content/whisper_large_kbd_lora/checkpoint-64"
	language = "Georgian"
	task = "transcribe"
	peft_config = PeftConfig.from_pretrained(peft_model_id)
	model = WhisperForConditionalGeneration.from_pretrained(
	peft_config.base_model_name_or_path, load_in_8bit=True, device_map="auto"
	)

	model = PeftModel.from_pretrained(model, peft_model_id)
	tokenizer = WhisperTokenizer.from_pretrained(peft_config.base_model_name_or_path, language=language, task=task)
	processor = WhisperProcessor.from_pretrained(peft_config.base_model_name_or_path, language=language, task=task)
	feature_extractor = processor.feature_extractor
	forced_decoder_ids = processor.get_decoder_prompt_ids(language=language, task=task)
	pipe = AutomaticSpeechRecognitionPipeline(model=model, tokenizer=tokenizer, feature_extractor=feature_extractor)

	def transcribe(path_to_audio):
	with torch.cuda.amp.autocast():
	text = pipe(audio_path, generate_kwargs={"forced_decoder_ids": forced_decoder_ids}, max_new_tokens=255)["text"]
	return text

	transcribe(path_to_audio)
	```

	## Training Details

	### Training Data

	<!-- This should link to a Data Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->

	[More Information Needed]


	## Training procedure


	The following `bitsandbytes` quantization config was used during training:
	- quant_method: bitsandbytes
	- load_in_8bit: True
	- load_in_4bit: False
	- llm_int8_threshold: 6.0
	- llm_int8_skip_modules: None
	- llm_int8_enable_fp32_cpu_offload: False
	- llm_int8_has_fp16_weight: False
	- bnb_4bit_quant_type: fp4
	- bnb_4bit_use_double_quant: False
	- bnb_4bit_compute_dtype: float32

	### Framework versions


	- PEFT 0.6.0.dev0