Spaces:
Running
Running
Merge pull request #340 from jhj0517/feature/update-installation-guide
Browse files
README.md
CHANGED
|
@@ -25,33 +25,21 @@ If you wish to try this on Colab, you can do it in [here](https://colab.research
|
|
| 25 |
- Translate subtitle files using Facebook NLLB models
|
| 26 |
- Translate subtitle files using DeepL API
|
| 27 |
- Pre-processing audio input with [Silero VAD](https://github.com/snakers4/silero-vad).
|
| 28 |
-
- Pre-processing audio input to separate BGM with [UVR](https://github.com/Anjok07/ultimatevocalremovergui)
|
| 29 |
- Post-processing with speaker diarization using the [pyannote](https://huggingface.co/pyannote/speaker-diarization-3.1) model.
|
| 30 |
- To download the pyannote model, you need to have a Huggingface token and manually accept their terms in the pages below.
|
| 31 |
1. https://huggingface.co/pyannote/speaker-diarization-3.1
|
| 32 |
2. https://huggingface.co/pyannote/segmentation-3.0
|
| 33 |
|
| 34 |
# Installation and Running
|
| 35 |
-
### Prerequisite
|
| 36 |
-
To run this WebUI, you need to have `git`, `python` version 3.8 ~ 3.10, `FFmpeg`. <br>
|
| 37 |
-
And if you're not using an Nvida GPU, or using a different `CUDA` version than 12.4, edit the [`requirements.txt`](https://github.com/jhj0517/Whisper-WebUI/blob/master/requirements.txt) to match your environment.
|
| 38 |
-
|
| 39 |
-
Please follow the links below to install the necessary software:
|
| 40 |
-
- git : [https://git-scm.com/downloads](https://git-scm.com/downloads)
|
| 41 |
-
- python : [https://www.python.org/downloads/](https://www.python.org/downloads/) **( If your python version is too new, torch will not install properly.)**
|
| 42 |
-
- FFmpeg : [https://ffmpeg.org/download.html](https://ffmpeg.org/download.html)
|
| 43 |
-
- CUDA : [https://developer.nvidia.com/cuda-downloads](https://developer.nvidia.com/cuda-downloads)
|
| 44 |
|
| 45 |
-
|
| 46 |
|
| 47 |
-
|
| 48 |
|
| 49 |
-
1.
|
| 50 |
-
2.
|
| 51 |
-
3. Start WebUI
|
| 52 |
-
4. To update the WebUI, run `update.bat` or `update.sh`
|
| 53 |
-
|
| 54 |
-
And you can also run the project with command line arguments if you like to, see [wiki](https://github.com/jhj0517/Whisper-WebUI/wiki/Command-Line-Arguments) for a guide to arguments.
|
| 55 |
|
| 56 |
- ## Running with Docker
|
| 57 |
|
|
@@ -79,6 +67,31 @@ docker compose up
|
|
| 79 |
|
| 80 |
If needed, update the [`docker-compose.yaml`](https://github.com/jhj0517/Whisper-WebUI/blob/master/docker-compose.yaml) to match your environment.
|
| 81 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 82 |
# VRAM Usages
|
| 83 |
This project is integrated with [faster-whisper](https://github.com/guillaumekln/faster-whisper) by default for better VRAM usage and transcription speed.
|
| 84 |
|
|
|
|
| 25 |
- Translate subtitle files using Facebook NLLB models
|
| 26 |
- Translate subtitle files using DeepL API
|
| 27 |
- Pre-processing audio input with [Silero VAD](https://github.com/snakers4/silero-vad).
|
| 28 |
+
- Pre-processing audio input to separate BGM with [UVR](https://github.com/Anjok07/ultimatevocalremovergui).
|
| 29 |
- Post-processing with speaker diarization using the [pyannote](https://huggingface.co/pyannote/speaker-diarization-3.1) model.
|
| 30 |
- To download the pyannote model, you need to have a Huggingface token and manually accept their terms in the pages below.
|
| 31 |
1. https://huggingface.co/pyannote/speaker-diarization-3.1
|
| 32 |
2. https://huggingface.co/pyannote/segmentation-3.0
|
| 33 |
|
| 34 |
# Installation and Running
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 35 |
|
| 36 |
+
- ## Running with Pinokio
|
| 37 |
|
| 38 |
+
The app is able to run with [Pinokio](https://github.com/pinokiocomputer/pinokio).
|
| 39 |
|
| 40 |
+
1. Install [Pinokio Software](https://program.pinokio.computer/#/?id=install).
|
| 41 |
+
2. Open the software and search for Whisper-WebUI and install it.
|
| 42 |
+
3. Start the Whisper-WebUI and connect to the `http://localhost:7860`.
|
|
|
|
|
|
|
|
|
|
| 43 |
|
| 44 |
- ## Running with Docker
|
| 45 |
|
|
|
|
| 67 |
|
| 68 |
If needed, update the [`docker-compose.yaml`](https://github.com/jhj0517/Whisper-WebUI/blob/master/docker-compose.yaml) to match your environment.
|
| 69 |
|
| 70 |
+
- ## Run Locally
|
| 71 |
+
|
| 72 |
+
### Prerequisite
|
| 73 |
+
To run this WebUI, you need to have `git`, `python` version 3.8 ~ 3.10, `FFmpeg`. <br>
|
| 74 |
+
And if you're not using an Nvida GPU, or using a different `CUDA` version than 12.4, edit the [`requirements.txt`](https://github.com/jhj0517/Whisper-WebUI/blob/master/requirements.txt) to match your environment.
|
| 75 |
+
|
| 76 |
+
Please follow the links below to install the necessary software:
|
| 77 |
+
- git : [https://git-scm.com/downloads](https://git-scm.com/downloads)
|
| 78 |
+
- python : [https://www.python.org/downloads/](https://www.python.org/downloads/) **( If your python version is too new, torch will not install properly.)**
|
| 79 |
+
- FFmpeg : [https://ffmpeg.org/download.html](https://ffmpeg.org/download.html)
|
| 80 |
+
- CUDA : [https://developer.nvidia.com/cuda-downloads](https://developer.nvidia.com/cuda-downloads)
|
| 81 |
+
|
| 82 |
+
After installing FFmpeg, **make sure to add the `FFmpeg/bin` folder to your system PATH!**
|
| 83 |
+
|
| 84 |
+
### Automatic Installation
|
| 85 |
+
|
| 86 |
+
1. git clone this repository
|
| 87 |
+
```shell
|
| 88 |
+
https://github.com/jhj0517/Whisper-WebUI.git
|
| 89 |
+
```
|
| 90 |
+
2. Run `install.bat` or `install.sh` to install dependencies. (This will create a `venv` directory and install dependencies there.)
|
| 91 |
+
3. Start WebUI with `start-webui.bat` or `start-webui.sh`
|
| 92 |
+
|
| 93 |
+
And you can also run the project with command line arguments if you like to, see [wiki](https://github.com/jhj0517/Whisper-WebUI/wiki/Command-Line-Arguments) for a guide to arguments.
|
| 94 |
+
|
| 95 |
# VRAM Usages
|
| 96 |
This project is integrated with [faster-whisper](https://github.com/guillaumekln/faster-whisper) by default for better VRAM usage and transcription speed.
|
| 97 |
|