LAP-DEV commited on
Commit
c435fe8
·
verified ·
1 Parent(s): 3b9e233

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +8 -25
README.md CHANGED
@@ -3,33 +3,27 @@ sdk: gradio
3
  sdk_version: 5.6.0
4
  ---
5
  # Whisper-WebUI
6
- A Gradio-based browser interface for [Whisper](https://github.com/openai/whisper). You can use it as an Easy Subtitle Generator!
7
-
8
- ![Whisper WebUI](https://github.com/jhj0517/Whsiper-WebUI/blob/master/screenshot.png)
9
 
10
  ## Notebook
11
  If you wish to try this on Colab, you can do it in [here](https://colab.research.google.com/github/jhj0517/Whisper-WebUI/blob/master/notebook/whisper-webui.ipynb)!
12
 
13
- # Feature
14
  - Select the Whisper implementation you want to use between :
15
  - [openai/whisper](https://github.com/openai/whisper)
16
  - [SYSTRAN/faster-whisper](https://github.com/SYSTRAN/faster-whisper) (used by default)
17
  - [Vaibhavs10/insanely-fast-whisper](https://github.com/Vaibhavs10/insanely-fast-whisper)
18
  - Generate subtitles from various sources, including :
19
  - Files
20
- - Youtube
21
  - Microphone
22
- - Currently supported subtitle formats :
23
- - SRT
24
- - WebVTT
25
- - txt ( only text file without timeline )
26
  - Speech to Text Translation
27
  - From other languages to English. ( This is Whisper's end-to-end speech-to-text translation feature )
28
- - Text to Text Translation
29
  - Translate subtitle files using Facebook NLLB models
30
- - Translate subtitle files using DeepL API
31
- - Pre-processing audio input with [Silero VAD](https://github.com/snakers4/silero-vad).
32
- - Pre-processing audio input to separate BGM with [UVR](https://github.com/Anjok07/ultimatevocalremovergui), [UVR-api](https://github.com/NextAudioGen/ultimatevocalremover_api).
33
  - Post-processing with speaker diarization using the [pyannote](https://huggingface.co/pyannote/speaker-diarization-3.1) model.
34
  - To download the pyannote model, you need to have a Huggingface token and manually accept their terms in the pages below.
35
  1. https://huggingface.co/pyannote/speaker-diarization-3.1
@@ -107,15 +101,4 @@ This is Whisper's original VRAM usage table for models.
107
  | large | 1550 M | N/A | `large` | ~10 GB | 1x |
108
 
109
 
110
- `.en` models are for English only, and the cool thing is that you can use the `Translate to English` option from the "large" models!
111
-
112
- ## TODO🗓
113
-
114
- - [x] Add DeepL API translation
115
- - [x] Add NLLB Model translation
116
- - [x] Integrate with faster-whisper
117
- - [x] Integrate with insanely-fast-whisper
118
- - [x] Integrate with whisperX ( Only speaker diarization part )
119
- - [x] Add background music separation pre-processing with [UVR](https://github.com/Anjok07/ultimatevocalremovergui)
120
- - [ ] Add fast api script
121
- - [ ] Support real-time transcription for microphone
 
3
  sdk_version: 5.6.0
4
  ---
5
  # Whisper-WebUI
6
+ A Gradio-based browser interface for [Whisper](https://github.com/openai/whisper).
 
 
7
 
8
  ## Notebook
9
  If you wish to try this on Colab, you can do it in [here](https://colab.research.google.com/github/jhj0517/Whisper-WebUI/blob/master/notebook/whisper-webui.ipynb)!
10
 
11
+ # Features
12
  - Select the Whisper implementation you want to use between :
13
  - [openai/whisper](https://github.com/openai/whisper)
14
  - [SYSTRAN/faster-whisper](https://github.com/SYSTRAN/faster-whisper) (used by default)
15
  - [Vaibhavs10/insanely-fast-whisper](https://github.com/Vaibhavs10/insanely-fast-whisper)
16
  - Generate subtitles from various sources, including :
17
  - Files
 
18
  - Microphone
19
+ - Currently supported output formats :
20
+ - csv
21
+ - srt
22
+ - txt
23
  - Speech to Text Translation
24
  - From other languages to English. ( This is Whisper's end-to-end speech-to-text translation feature )
 
25
  - Translate subtitle files using Facebook NLLB models
26
+ - Pre-processing audio input with [Silero VAD](https://github.com/snakers4/silero-vad).
 
 
27
  - Post-processing with speaker diarization using the [pyannote](https://huggingface.co/pyannote/speaker-diarization-3.1) model.
28
  - To download the pyannote model, you need to have a Huggingface token and manually accept their terms in the pages below.
29
  1. https://huggingface.co/pyannote/speaker-diarization-3.1
 
101
  | large | 1550 M | N/A | `large` | ~10 GB | 1x |
102
 
103
 
104
+ `.en` models are for English only, and the cool thing is that you can use the `Translate to English` option from the "large" models!