metadata

language: ja
tags:
  - audio
  - automatic-speech-recognition
license: apache-2.0

Kotoba-Whisper: kotoba-whisper-v1.0 for Whisper cpp

This repository contains the model weights for kotoba-tech/kotoba-whisper-v1.0 converted to GGML format. GGML is the weight format expected by C/C++ packages such as Whisper.cpp, for which we provide an example below.

Usage

Kotoba-Whisper can be run with the Whisper.cpp package with the original sequential long-form transcription algorithm.

Steps for getting started:

Clone the Whisper.cpp repository:

git clone https://github.com/ggerganov/whisper.cpp.git
cd whisper.cpp

Download the GGML weights for kotoba-tech/kotoba-whisper-v1.0:

wget https://huggingface.co/kotoba-tech/kotoba-whisper-v1.0-ggml/resolve/main/ggml-kotoba-whisper-v1.0.bin -P ./models

Run inference using the provided sample audio:

wget https://huggingface.co/kotoba-tech/kotoba-whisper-v1.0-ggml/resolve/main/sample_ja_speech.wav
make -j && ./main -m models/ggml-kotoba-whisper-v1.0.bin -f sample_ja_speech.wav -oj transcription.json -ml 30

Note that it runs only with 16-bit WAV files, so make sure to convert your input before running the tool. For example, you can use ffmpeg like this:

ffmpeg -i input.mp3 -ar 16000 -ac 1 -c:a pcm_s16le output.wav

Quantized Model

To use the quantized model, download the quantized GGML weights:

wget https://huggingface.co/kotoba-tech/kotoba-whisper-v1.0-ggml/resolve/main/ggml-kotoba-whisper-v1.0-q5_0.bin -P ./models

Run inference on the sample audio:

make -j && ./main -m models/ggml-kotoba-whisper-v1.0-q5_0.bin -f sample_ja_speech.wav -oj transcription-quantized.json -ml 30

Model Details

For more information about the kotoba-whisper-v1.0, refer to the original model card.