LAP-DEV commited on
Commit
b5a1a25
·
verified ·
1 Parent(s): 712e41d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +54 -71
README.md CHANGED
@@ -3,82 +3,70 @@ sdk: gradio
3
  sdk_version: 5.16.0
4
  ---
5
  # Whisper-WebUI
6
- A Gradio-based browser interface for [Whisper](https://github.com/openai/whisper).
7
-
8
- ## Notebook
9
- If you wish to try this on Colab, you can do it in [here](https://colab.research.google.com/github/jhj0517/Whisper-WebUI/blob/master/notebook/whisper-webui.ipynb)!
10
 
11
  # Features
12
- - Select the Whisper implementation you want to use between :
13
  - [openai/whisper](https://github.com/openai/whisper)
14
  - [SYSTRAN/faster-whisper](https://github.com/SYSTRAN/faster-whisper) (used by default)
15
  - [Vaibhavs10/insanely-fast-whisper](https://github.com/Vaibhavs10/insanely-fast-whisper)
16
- - Generate subtitles from various sources, including :
17
- - Files
18
- - Microphone
19
- - Currently supported output formats :
20
- - csv
21
- - srt
22
- - txt
23
- - Speech to Text Translation
24
- - From other languages to English. ( This is Whisper's end-to-end speech-to-text translation feature )
25
- - Translate subtitle files using Facebook NLLB models
26
- - Pre-processing audio input with [Silero VAD](https://github.com/snakers4/silero-vad).
27
- - Post-processing with speaker diarization using the [pyannote](https://huggingface.co/pyannote/speaker-diarization-3.1) model.
28
- - To download the pyannote model, you need to have a Huggingface token and manually accept their terms in the pages below.
29
  1. https://huggingface.co/pyannote/speaker-diarization-3.1
30
  2. https://huggingface.co/pyannote/segmentation-3.0
31
 
32
  # Installation and Running
33
- ### Prerequisite
34
- To run this WebUI, you need to have `git`, `python` version 3.8 ~ 3.10, `FFmpeg`. <br>
35
- And if you're not using an Nvida GPU, or using a different `CUDA` version than 12.4, edit the [`requirements.txt`](https://github.com/jhj0517/Whisper-WebUI/blob/master/requirements.txt) to match your environment.
36
-
37
- Please follow the links below to install the necessary software:
38
- - git : [https://git-scm.com/downloads](https://git-scm.com/downloads)
39
- - python : [https://www.python.org/downloads/](https://www.python.org/downloads/) **( If your python version is too new, torch will not install properly.)**
40
- - FFmpeg : [https://ffmpeg.org/download.html](https://ffmpeg.org/download.html)
41
- - CUDA : [https://developer.nvidia.com/cuda-downloads](https://developer.nvidia.com/cuda-downloads)
42
-
43
- After installing FFmpeg, **make sure to add the `FFmpeg/bin` folder to your system PATH!**
44
-
45
- ### Automatic Installation
46
-
47
- 1. Download `Whisper-WebUI.zip` with the file corresponding to your OS from [v1.0.0](https://github.com/jhj0517/Whisper-WebUI/releases/tag/v1.0.0) and extract its contents.
48
- 2. Run `install.bat` or `install.sh` to install dependencies. (This will create a `venv` directory and install dependencies there.)
49
- 3. Start WebUI with `start-webui.bat` or `start-webui.sh`
50
- 4. To update the WebUI, run `update.bat` or `update.sh`
51
-
52
- And you can also run the project with command line arguments if you like to, see [wiki](https://github.com/jhj0517/Whisper-WebUI/wiki/Command-Line-Arguments) for a guide to arguments.
53
-
54
- - ## Running with Docker
55
-
56
- 1. Install and launch [Docker-Desktop](https://www.docker.com/products/docker-desktop/).
57
-
58
- 2. Git clone the repository
59
 
60
- ```sh
61
- git clone https://github.com/jhj0517/Whisper-WebUI.git
62
- ```
63
-
64
- 3. Build the image ( Image is about 7GB~ )
65
-
66
- ```sh
67
- docker compose build
68
- ```
69
-
70
- 4. Run the container
71
-
72
- ```sh
73
- docker compose up
74
- ```
75
-
76
- 5. Connect to the WebUI with your browser at `http://localhost:7860`
77
-
78
- If needed, update the [`docker-compose.yaml`](https://github.com/jhj0517/Whisper-WebUI/blob/master/docker-compose.yaml) to match your environment.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
79
 
80
  # VRAM Usages
81
- This project is integrated with [faster-whisper](https://github.com/guillaumekln/faster-whisper) by default for better VRAM usage and transcription speed.
82
 
83
  According to faster-whisper, the efficiency of the optimized whisper model is as follows:
84
  | Implementation | Precision | Beam size | Time | Max. GPU memory | Max. CPU memory |
@@ -86,12 +74,8 @@ According to faster-whisper, the efficiency of the optimized whisper model is as
86
  | openai/whisper | fp16 | 5 | 4m30s | 11325MB | 9439MB |
87
  | faster-whisper | fp16 | 5 | 54s | 4755MB | 3244MB |
88
 
89
- If you want to use an implementation other than faster-whisper, use `--whisper_type` arg and the repository name.<br>
90
- Read [wiki](https://github.com/jhj0517/Whisper-WebUI/wiki/Command-Line-Arguments) for more info about CLI args.
91
-
92
  ## Available models
93
- This is Whisper's original VRAM usage table for models.
94
-
95
  | Size | Parameters | English-only model | Multilingual model | Required VRAM | Relative speed |
96
  |:------:|:----------:|:------------------:|:------------------:|:-------------:|:--------------:|
97
  | tiny | 39 M | `tiny.en` | `tiny` | ~1 GB | ~32x |
@@ -100,5 +84,4 @@ This is Whisper's original VRAM usage table for models.
100
  | medium | 769 M | `medium.en` | `medium` | ~5 GB | ~2x |
101
  | large | 1550 M | N/A | `large` | ~10 GB | 1x |
102
 
103
-
104
- `.en` models are for English only, and the cool thing is that you can use the `Translate to English` option from the "large" models!
 
3
  sdk_version: 5.16.0
4
  ---
5
  # Whisper-WebUI
6
+ A Gradio-based browser interface for [Whisper](https://github.com/openai/whisper)
 
 
 
7
 
8
  # Features
9
+ - Select the Whisper implementation you want to use between:
10
  - [openai/whisper](https://github.com/openai/whisper)
11
  - [SYSTRAN/faster-whisper](https://github.com/SYSTRAN/faster-whisper) (used by default)
12
  - [Vaibhavs10/insanely-fast-whisper](https://github.com/Vaibhavs10/insanely-fast-whisper)
13
+ - Generate transcriptions from various sources, including **files** & **microphone**
14
+ - Currently supported output formats: **csv**, **srt** & **txt**
15
+ - Speech to Text Translation:
16
+ - From other languages to English (This is Whisper's end-to-end speech-to-text translation feature)
17
+ - Translate transcription files using Facebook NLLB models
18
+ - Pre-processing audio input with [Silero VAD](https://github.com/snakers4/silero-vad)
19
+ - Post-processing with speaker diarization using the [pyannote](https://huggingface.co/pyannote/speaker-diarization-3.1) model:
20
+ - To download the pyannote model, you need to have a Huggingface token and manually accept their terms in the pages below:
 
 
 
 
 
21
  1. https://huggingface.co/pyannote/speaker-diarization-3.1
22
  2. https://huggingface.co/pyannote/segmentation-3.0
23
 
24
  # Installation and Running
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
25
 
26
+ - ## Run Locally
27
+
28
+ ### Prerequisite
29
+ To run this WebUI, you need to have `git`, `python` version 3.8 ~ 3.10, `FFmpeg`<br>
30
+ And if you're not using an Nvida GPU, or using a different `CUDA` version than 12.4, edit the **file requirements.txt** to match your environment
31
+
32
+ Please follow the links below to install the necessary software:
33
+ - git : [https://git-scm.com/downloads](https://git-scm.com/downloads)
34
+ - python : [https://www.python.org/downloads/](https://www.python.org/downloads/)
35
+ - FFmpeg : [https://ffmpeg.org/download.html](https://ffmpeg.org/download.html)
36
+ - CUDA : [https://developer.nvidia.com/cuda-downloads](https://developer.nvidia.com/cuda-downloads)
37
+
38
+ After installing FFmpeg, **make sure to add the `FFmpeg/bin` folder to your system PATH!**
39
+
40
+ ### Installation Using the Script Files
41
+
42
+ 1. Download the the repository and extract its contents
43
+ 2. Run `install.bat` or `install.sh` to install dependencies (It will create a `venv` directory and install dependencies there)
44
+ 3. Start WebUI with `start-webui.bat` or `start-webui.sh` (It will run `python app.py` after activating the venv)
45
+
46
+ - ## Running with Docker
47
+
48
+ 1. Install and launch [Docker-Desktop](https://www.docker.com/products/docker-desktop/)
49
+
50
+ 2. Get the repository
51
+
52
+ 3. Build the image ( Image is about 7GB~ )
53
+
54
+ ```sh
55
+ docker compose build
56
+ ```
57
+
58
+ 4. Run the container
59
+
60
+ ```sh
61
+ docker compose up
62
+ ```
63
+
64
+ 5. Connect to the WebUI with your browser at `http://localhost:7860`
65
+
66
+ If needed, update the **docker-compose.yaml** to match your environment
67
 
68
  # VRAM Usages
69
+ This project is integrated with [faster-whisper](https://github.com/guillaumekln/faster-whisper) by default for better VRAM usage and transcription speed
70
 
71
  According to faster-whisper, the efficiency of the optimized whisper model is as follows:
72
  | Implementation | Precision | Beam size | Time | Max. GPU memory | Max. CPU memory |
 
74
  | openai/whisper | fp16 | 5 | 4m30s | 11325MB | 9439MB |
75
  | faster-whisper | fp16 | 5 | 54s | 4755MB | 3244MB |
76
 
 
 
 
77
  ## Available models
78
+ This is Whisper's original VRAM usage table for models:
 
79
  | Size | Parameters | English-only model | Multilingual model | Required VRAM | Relative speed |
80
  |:------:|:----------:|:------------------:|:------------------:|:-------------:|:--------------:|
81
  | tiny | 39 M | `tiny.en` | `tiny` | ~1 GB | ~32x |
 
84
  | medium | 769 M | `medium.en` | `medium` | ~5 GB | ~2x |
85
  | large | 1550 M | N/A | `large` | ~10 GB | 1x |
86
 
87
+ `.en` models are for English only, and you can use the `Translate to English` option from the other models