A newer version of the Gradio SDK is available:
5.16.1
sdk: gradio
sdk_version: 5.16.0
Whisper-WebUI
A Gradio-based browser interface for Whisper
Features
- Select the Whisper implementation you want to use between:
- openai/whisper
- SYSTRAN/faster-whisper (used by default)
- Vaibhavs10/insanely-fast-whisper
- Generate transcriptions from various sources, including files & microphone
- Currently supported output formats: csv, srt & txt
- Speech to Text Translation:
- From other languages to English (This is Whisper's end-to-end speech-to-text translation feature)
- Translate transcription files using Facebook NLLB models
- Pre-processing audio input with Silero VAD
- Post-processing with speaker diarization using the pyannote model:
- To download the pyannote model, you need to have a Huggingface token and manually accept their terms in the pages below:
Installation and Running
Run Locally
Prerequisite
To run this WebUI, you need to have
git
,python
version 3.8 ~ 3.10 &FFmpeg
.
If you're not using an Nvida GPU, or using a differentCUDA
version than 12.4, edit the filerequirements.txt
to match your environment.Please follow the links below to install the necessary software:
- git : https://git-scm.com/downloads
- python : https://www.python.org/downloads/
- FFmpeg : https://ffmpeg.org/download.html
- CUDA : https://developer.nvidia.com/cuda-downloads
After installing
FFmpeg
, make sure to add theFFmpeg/bin
folder to your systemPATH
Installation using the script files
- Download the the repository and extract its contents
- Run
install.bat
orinstall.sh
to install dependencies (It will create avenv
directory and install dependencies there) - Start WebUI with
start-webui.bat
orstart-webui.sh
(It will runpython app.py
after activating the venv)
Running with Docker
Install and launch Docker-Desktop
Get the repository
If needed, update the
docker-compose.yaml
to match your environmentDocker commands:
Build the image ( Image is about ~7GB)
docker compose build
Run the container
docker compose up
Connect to the WebUI with your browser at
http://localhost:7860
VRAM Usages
This project is integrated with faster-whisper by default for better VRAM usage and transcription speed.
According to faster-whisper, the efficiency of the optimized whisper model is as follows:Implementation Precision Beam size Time Max. GPU memory Max. CPU memory openai/whisper fp16 5 4m30s 11325MB 9439MB faster-whisper fp16 5 54s 4755MB 3244MB Whisper's original VRAM usage table for available models:
Size Parameters English-only model Multilingual model Required VRAM Relative speed tiny 39 M tiny.en
tiny
~1 GB ~32x base 74 M base.en
base
~1 GB ~16x small 244 M small.en
small
~2 GB ~6x medium 769 M medium.en
medium
~5 GB ~2x large 1550 M N/A large
~10 GB 1x Note:
.en
models are for English only, and you can use theTranslate to English
option from the other models