Demo / README.md
LAP-DEV's picture
Update README.md
cfecb80 verified

A newer version of the Gradio SDK is available: 5.16.1

Upgrade
metadata
sdk: gradio
sdk_version: 5.16.0

Whisper-WebUI

A Gradio-based browser interface for Whisper

Features

Installation and Running

  • Run Locally

    Prerequisite

    To run this WebUI, you need to have git, python version 3.8 ~ 3.10 & FFmpeg.
    If you're not using an Nvida GPU, or using a different CUDA version than 12.4, edit the file requirements.txt to match your environment.

    Please follow the links below to install the necessary software:

    After installing FFmpeg, make sure to add the FFmpeg/bin folder to your system PATH

    Installation using the script files

    1. Download the the repository and extract its contents
    2. Run install.bat or install.sh to install dependencies (It will create a venv directory and install dependencies there)
    3. Start WebUI with start-webui.bat or start-webui.sh (It will run python app.py after activating the venv)
  • Running with Docker

    1. Install and launch Docker-Desktop

    2. Get the repository

    3. If needed, update the docker-compose.yaml to match your environment

    4. Docker commands:

      Build the image ( Image is about ~7GB)

      docker compose build 
      

      Run the container

      docker compose up
      
    5. Connect to the WebUI with your browser at http://localhost:7860

VRAM Usages

  • This project is integrated with faster-whisper by default for better VRAM usage and transcription speed.
    According to faster-whisper, the efficiency of the optimized whisper model is as follows:

    Implementation Precision Beam size Time Max. GPU memory Max. CPU memory
    openai/whisper fp16 5 4m30s 11325MB 9439MB
    faster-whisper fp16 5 54s 4755MB 3244MB
  • Whisper's original VRAM usage table for available models:

    Size Parameters English-only model Multilingual model Required VRAM Relative speed
    tiny 39 M tiny.en tiny ~1 GB ~32x
    base 74 M base.en base ~1 GB ~16x
    small 244 M small.en small ~2 GB ~6x
    medium 769 M medium.en medium ~5 GB ~2x
    large 1550 M N/A large ~10 GB 1x

    Note: .en models are for English only, and you can use the Translate to English option from the other models