kotoba-tech
/

kotoba-whisper-v1.0-ggml

Automatic Speech Recognition

Model card Files Files and versions

asahi417 commited on May 7, 2024

Commit

c5bbccf

·

verified ·

1 Parent(s): 649f58a

Update README.md

Files changed (1) hide show

README.md +18 -0

README.md CHANGED Viewed

@@ -40,6 +40,22 @@ Note that it runs only with 16-bit WAV files, so make sure to convert your input
 ffmpeg -i input.mp3 -ar 16000 -ac 1 -c:a pcm_s16le output.wav
 ```
 ### Quantized Model
 To use the quantized model, download the quantized GGML weights:
@@ -52,6 +68,8 @@ Run inference on the sample audio:
 make -j && ./main -m models/ggml-kotoba-whisper-v1.0-q5_0.bin -f sample_ja_speech.wav --output-file transcription.quantized --output-json
 ```
 ## Model Details

 ffmpeg -i input.mp3 -ar 16000 -ac 1 -c:a pcm_s16le output.wav
 ```
+### Benchmark
+We measure the inference speed with four different Japanese speech audio on MacBook Pro with the following spec:
+- Apple M2 Pro
+- 32GB
+- 14-inch, 2023
+- OS Sonoma Version 14.4.1 (23E224)
+| audio duration (min)| inference time (sec) |
+|---------------------|-------------|
+| 50.3 | 581       |
+| 5.6  | 41       |
+| 4.9  | 30       |
+| 5.6  | 35       |
 ### Quantized Model
 To use the quantized model, download the quantized GGML weights:
 make -j && ./main -m models/ggml-kotoba-whisper-v1.0-q5_0.bin -f sample_ja_speech.wav --output-file transcription.quantized --output-json
 ```
+Note that the benchmark results are almost identical to the raw non-quantized model weight.
 ## Model Details