Demo / README.md
LAP-DEV's picture
Update README.md
f60c46f verified
|
raw
history blame
4.5 kB
metadata
sdk: gradio
sdk_version: 5.16.0

Whisper-WebUI

A Gradio-based browser interface for Whisper

Features

Installation and Running

  • Run Locally

    Prerequisite

    To run this WebUI, you need to have git, python version 3.8 ~ 3.10, FFmpeg
    And if you're not using an Nvida GPU, or using a different CUDA version than 12.4, edit the file requirements.txt to match your environment

    Please follow the links below to install the necessary software:
    - git : [https://git-scm.com/downloads](https://git-scm.com/downloads)
    - python : [https://www.python.org/downloads/](https://www.python.org/downloads/)
    - FFmpeg :  [https://ffmpeg.org/download.html](https://ffmpeg.org/download.html)
    - CUDA : [https://developer.nvidia.com/cuda-downloads](https://developer.nvidia.com/cuda-downloads)
    

    After installing FFmpeg, make sure to add the FFmpeg/bin folder to your system PATH

    Installation Using the Script Files

    1. Download the the repository and extract its contents 
    2. Run `install.bat` or `install.sh` to install dependencies (It will create a `venv` directory and install dependencies there)
    3. Start WebUI with `start-webui.bat` or `start-webui.sh` (It will run `python app.py` after activating the venv)
    
  • Running with Docker

    1. Install and launch Docker-Desktop

    2. Get the repository

    3. Build the image ( Image is about ~7GB)

    docker compose build 
    
    1. Run the container
    docker compose up
    
    1. Connect to the WebUI with your browser at http://localhost:7860

    Note: If needed, update the docker-compose.yaml to match your environment

VRAM Usages

  • This project is integrated with faster-whisper by default for better VRAM usage and transcription speed.
    According to faster-whisper, the efficiency of the optimized whisper model is as follows:

    Implementation Precision Beam size Time Max. GPU memory Max. CPU memory
    openai/whisper fp16 5 4m30s 11325MB 9439MB
    faster-whisper fp16 5 54s 4755MB 3244MB
  • Whisper's original VRAM usage table for available models:

    Size Parameters English-only model Multilingual model Required VRAM Relative speed
    tiny 39 M tiny.en tiny ~1 GB ~32x
    base 74 M base.en base ~1 GB ~16x
    small 244 M small.en small ~2 GB ~6x
    medium 769 M medium.en medium ~5 GB ~2x
    large 1550 M N/A large ~10 GB 1x

Note: .en models are for English only, and you can use the Translate to English option from the other models