kotoba-tech
/

kotoba-whisper-v1.0-ggml

Automatic Speech Recognition

Model card Files Files and versions

asahi417 commited on Sep 17, 2024

Commit

b765534

·

verified ·

1 Parent(s): ade7231

Update README.md

Files changed (1) hide show

README.md +18 -14

README.md CHANGED Viewed

@@ -62,6 +62,21 @@ Also, currently whisper.cpp and faster-whisper support the [sequential long-form
 and only Huggingface pipeline supports the [chunked long-form decoding](https://huggingface.co/distil-whisper/distil-large-v3#chunked-long-form), which we empirically
  found better than the sequnential long-form decoding.
 ### Conversion details
 The original model was converted with the following command:
@@ -77,23 +92,12 @@ git clone https://huggingface.co/kotoba-tech/kotoba-whisper-v1.0
 # convert to ggml
 python3 ./convert-h5-to-ggml.py ./kotoba-whisper-v1.0/ ../../whisper .
 mv ggml-model.bin ggml-kotoba-whisper-v1.0
-```
-### Quantized Model
-To use the quantized model, download the quantized GGML weights:
-```bash
-wget https://huggingface.co/kotoba-tech/kotoba-whisper-v1.0-ggml/resolve/main/ggml-kotoba-whisper-v1.0-q5_0.bin -P ./models
-```
-Run inference on the sample audio:
-```bash
-make -j && ./main -m models/ggml-kotoba-whisper-v1.0-q5_0.bin -f sample_ja_speech.wav --output-file transcription.quantized --output-json
 ```
-Note that the benchmark results are almost identical to the raw non-quantized model weight.
 ## Model Details
 For more information about the kotoba-whisper-v1.0, refer to the original [model card](https://huggingface.co/kotoba-tech/kotoba-whisper-v1.0).

 and only Huggingface pipeline supports the [chunked long-form decoding](https://huggingface.co/distil-whisper/distil-large-v3#chunked-long-form), which we empirically
  found better than the sequnential long-form decoding.
+### Quantized Model
+To use the quantized model, download the quantized GGML weights:
+```bash
+wget https://huggingface.co/kotoba-tech/kotoba-whisper-v1.0-ggml/resolve/main/ggml-kotoba-whisper-v1.0-q5_0.bin -P ./models
+```
+Run inference on the sample audio:
+```bash
+make -j && ./main -m models/ggml-kotoba-whisper-v1.0-q5_0.bin -f sample_ja_speech.wav --output-file transcription.quantized --output-json
+```
+Note that the benchmark results are almost identical to the raw non-quantized model weight.
 ### Conversion details
 The original model was converted with the following command:
 # convert to ggml
 python3 ./convert-h5-to-ggml.py ./kotoba-whisper-v1.0/ ../../whisper .
 mv ggml-model.bin ggml-kotoba-whisper-v1.0
+# quantize ggml model
+cd ../
+./quantize models/ggml-kotoba-whisper-v1.0.bin models/ggml-kotoba-whisper-v1.0-q5_0.bin q5_0
 ```
 ## Model Details
 For more information about the kotoba-whisper-v1.0, refer to the original [model card](https://huggingface.co/kotoba-tech/kotoba-whisper-v1.0).