sdk: gradio
sdk_version: 5.16.0
Whisper-WebUI
A Gradio-based browser interface for Whisper
Features
- Select the Whisper implementation you want to use between:
- openai/whisper
- SYSTRAN/faster-whisper (used by default)
- Vaibhavs10/insanely-fast-whisper
- Generate transcriptions from various sources, including files & microphone
- Currently supported output formats: csv, srt & txt
- Speech to Text Translation:
- From other languages to English (This is Whisper's end-to-end speech-to-text translation feature)
- Translate transcription files using Facebook NLLB models
- Pre-processing audio input with Silero VAD
- Post-processing with speaker diarization using the pyannote model:
- To download the pyannote model, you need to have a Huggingface token and manually accept their terms in the pages below:
Installation and Running
Run Locally
Prerequisite
To run this WebUI, you need to have
git
,python
version 3.8 ~ 3.10,FFmpeg
And if you're not using an Nvida GPU, or using a differentCUDA
version than 12.4, edit the file requirements.txt to match your environmentPlease follow the links below to install the necessary software: - git : [https://git-scm.com/downloads](https://git-scm.com/downloads) - python : [https://www.python.org/downloads/](https://www.python.org/downloads/) - FFmpeg : [https://ffmpeg.org/download.html](https://ffmpeg.org/download.html) - CUDA : [https://developer.nvidia.com/cuda-downloads](https://developer.nvidia.com/cuda-downloads)
After installing FFmpeg, make sure to add the
FFmpeg/bin
folder to your system PATHInstallation Using the Script Files
1. Download the the repository and extract its contents 2. Run `install.bat` or `install.sh` to install dependencies (It will create a `venv` directory and install dependencies there) 3. Start WebUI with `start-webui.bat` or `start-webui.sh` (It will run `python app.py` after activating the venv)
Running with Docker
Install and launch Docker-Desktop
Get the repository
Build the image ( Image is about ~7GB)
docker compose build
- Run the container
docker compose up
- Connect to the WebUI with your browser at
http://localhost:7860
Note: If needed, update the docker-compose.yaml to match your environment
VRAM Usages
This project is integrated with faster-whisper by default for better VRAM usage and transcription speed.
According to faster-whisper, the efficiency of the optimized whisper model is as follows:Implementation Precision Beam size Time Max. GPU memory Max. CPU memory openai/whisper fp16 5 4m30s 11325MB 9439MB faster-whisper fp16 5 54s 4755MB 3244MB Whisper's original VRAM usage table for available models:
Size Parameters English-only model Multilingual model Required VRAM Relative speed tiny 39 M tiny.en
tiny
~1 GB ~32x base 74 M base.en
base
~1 GB ~16x small 244 M small.en
small
~2 GB ~6x medium 769 M medium.en
medium
~5 GB ~2x large 1550 M N/A large
~10 GB 1x
Note: .en
models are for English only, and you can use the Translate to English
option from the other models