|
--- |
|
license: mit |
|
language: |
|
- ja |
|
base_model: |
|
- kotoba-tech/kotoba-whisper-v2.2 |
|
pipeline_tag: automatic-speech-recognition |
|
--- |
|
|
|
# Whisper kotoba-whisper-v2.2 model for CTranslate2 |
|
This repository contains the conversion of [kotoba-tech/kotoba-whisper-v2.2](https://huggingface.co/kotoba-tech/kotoba-whisper-v2.2) to the [CTranslate2](https://github.com/OpenNMT/CTranslate2) model format. |
|
This model can be used in CTranslate2 or projects based on CTranslate2 such as [faster-whisper](https://github.com/systran/faster-whisper). |
|
|
|
# Example |
|
Install library and get a sample audio. |
|
```python |
|
pip install faster-whisper |
|
``` |
|
|
|
Inference with the kotoba-whisper-v2.2-faster. |
|
```python |
|
from faster_whisper import WhisperModel |
|
|
|
if __name__ == '__main__': |
|
model = WhisperModel(model_size_or_path="./kotoba-whisper-v2.2-faster", device="cuda", compute_type="float32", local_files_only=True) |
|
segments, info = model.transcribe(audio="./123.wav", language="ja", chunk_length=5, condition_on_previous_text=False, hotwords="ノイミー") |
|
|
|
for segment in segments: |
|
print("[%.2fs -> %.2fs] %s" % (segment.start, segment.end, segment.text)) |
|
|
|
``` |
|
|
|
# Conversion details |
|
The original model was converted with the following command: |
|
```python |
|
ct2-transformers-converter --model kotoba-tech/kotoba-whisper-v2.2 --output_dir kotoba-whisper-v2.2-faster --copy_files tokenizer.json preprocessor_config.json --quantization float32 |
|
``` |
|
Note that the model weights are saved in FP32. This type can be changed when the model is loaded using the [compute_type option in CTranslate2](https://opennmt.net/CTranslate2/quantization.html). |
|
|
|
# More information |
|
For more information about the kotoba-whisper-v2.0, refer to the original [model card](https://huggingface.co/kotoba-tech/kotoba-whisper-v2.2). |
|
|