numz
/

wav2lip_studio-0.2

ONNX

Model card Files Files and versions Community

numz commited on Oct 13, 2024

Commit

eea10fe

verified ·

1 Parent(s): d014a86

Update README.md

Browse files

Files changed (1) hide show

README.md +77 -47

README.md CHANGED Viewed

@@ -35,12 +35,37 @@ It improves the quality of the lip-sync videos generated by the [Wav2Lip tool](h
 * [☕ Support Wav2lip Studio](#-support-wav2lip-studio)
 ## 🚀 Updates
 **2024.02.09 Spped Up Update (Standalone version only)**
 - 👬 Clone voice: Add controls to manage the voice clone (See Usage section)
 - 🎏 translate video: Add features to translate panel to manage translation (See Usage section)
 - 📺 Add Trim feature: Add a feature to trim the video.
 - 🔑 Automatic mask: Add a feature to automatically calculate the mask parameters (padding, dilate...). You can change parameters if needed.
 - 🚀 Speed up processes : All processes are now faster, Analysis, Face Swap, Generation in High quality
 **2024.01.20 Major Update (Standalone version only)**
 - ♻ Manage project: Add a feature to manage multiple project
@@ -86,11 +111,12 @@ It improves the quality of the lip-sync videos generated by the [Wav2Lip tool](h
 - 📰 Control debug
 - 🐛 Fix resize factor bug
-## 🔗 Requirements
-- FFmpeg : download it from the [official FFmpeg site](https://ffmpeg.org/download.html). Follow the instructions appropriate for your operating system, note ffmpeg have to be accessible from the command line.
-- Make sure ffmpeg is in your PATH environment variable. If not, add it to your PATH environment variable.
-1. pyannote.audio:You need to agree to share your contact information to access pyannote models.
 To do so, go to both link:
     - [pyannote diarization-3.1 huggingface repository](https://huggingface.co/pyannote/speaker-diarization-3.1)
     - [pyannote segmentation-3.0 huggingface repository](https://huggingface.co/pyannote/segmentation-3.0)
@@ -110,10 +136,9 @@ set each field and click "Agree and access repository"
      }
      ```
-## 💻 Installation
-1. Install [python 3.10.11](https://www.python.org/downloads/release/python-31011/)
-2. Install [git](https://git-scm.com/downloads)
-3. Check ffmpeg, python, cuda and git installation
     ```bash
     python --version
     git --version
@@ -131,8 +156,14 @@ set each field and click "Agree and access repository"
     Cuda compilation tools, release 11.8, V11.8.89
     Build cuda_11.8.r11.8/compiler.31833905_0
     ```
-# Windows Users
 1. Install [Cuda 11.8](https://developer.nvidia.com/cuda-11-8-0-download-archive) if not ever done.
   ![Illustration](demo/cuda.png)
 2. Install [Visual Studio](https://visualstudio.microsoft.com/fr/downloads/). During the install, make sure to include the Python and C++ packages in visual studio installer.
@@ -147,9 +178,9 @@ set each field and click "Agree and access repository"
     ```
 4. double click on wav2lip-studio.bat, that will install the requirements and download the models
-# MACOS Users
-1. Install python 3.9 and other requirements
    ```
    brew update
    brew install [email protected]
@@ -158,21 +189,15 @@ set each field and click "Agree and access repository"
    git-lfs install
    xcode-select --install
    ```
-2. Unzip Wav2lipStudio zip achive in a folder
-3. Install environnement and requirements
    ```
-   cd /YourWav2lipStudioFolder
-   /opt/homebrew/bin/python3.9 -m venv venv
-   ./venv/bin/python3.9 -m pip install torch==2.1.2 torchvision==0.16.2 torchaudio==2.1.2
-   ./venv/bin/python3.9 -m pip install -r requirements.txt
-   ./venv/bin/python3.9 -m pip install transformers==4.33.2
-   ./venv/bin/python3.9 -m pip install numpy==1.24.4
    ```
-4. if It doesn't works or too long on pip install -r requirements.txt
    ```
    ./venv/bin/python3.9 -m pip install inaSpeechSegmenter
    ./venv/bin/python3.9 -m pip install tyro==0.8.5 pykalman==0.9.7
    ./venv/bin/python3.9 -m pip install TTS==0.21.2
@@ -181,28 +206,29 @@ set each field and click "Agree and access repository"
    ./venv/bin/python3.9 -m pip install transformers==4.33.2
    ./venv/bin/python3.9 -m pip install numpy==1.24.4
     ```
-4.1. for silicon
-   ```
-   ./venv/bin/python3.9 -m pip uninstall torch torchvision torchaudio
-   ./venv/bin/python3.9 -m pip install --pre torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/nightly/cpu
-   sed -i '' 's/from torchvision.transforms.functional_tensor import rgb_to_grayscale/from torchvision.transforms.functional import rgb_to_grayscale/' venv/lib/python3.9/site-packages/basicsr/data/degradations.py
-  ```
-5. Install models
    ```
    git clone https://huggingface.co/numz/wav2lip_studio-0.2 models
    git clone https://huggingface.co/KwaiVGI/LivePortrait models/pretrained_weights
    ```
-6. Launch UI
    ```
    mkdir projects
    ./venv/bin/python3.9 wav2lip_studio.py
    ```
-## Tutorial
 - [FR version](https://youtu.be/43Q8YASkcUA)
 - [EN Version](https://youtu.be/B84A5alpPDc)
-## 🐍 Usage
 ##PARAMETERS
 1. Enter project name and click enter.
 2. Choose a video (avi or mp4 format). Note avi file will not appear in Video input but process will works.
@@ -243,20 +269,24 @@ set each field and click "Agree and access repository"
                 ]
                 ```
       3. Input Video: Allow to use audio from the input video, voices cloning and translation. see [Input Video](#input-video) section for more details.
-11. **Video Quality**:
     - **Low**: Original Wav2Lip quality, fast but not very good.
     - **Medium**: Better quality by apply post processing on the mouth, slower.
     - **High**: Better quality by apply post processing and upscale the mouth quality, slower.
-12. **Wav2lip Checkpoint**: Choose beetwen 2 wav2lip model:
     - **Wav2lip**: Original Wav2Lip model, fast but not very good.
     - **Wav2lip GAN**: Better quality by apply post processing on the mouth, slower.
-13. **Face Restoration Model**: Choose beetwen 2 face restoration model:
     - **Code Former**:
       - A value of 0 offers higher quality but may significantly alter the person's facial appearance and cause noticeable flickering between frames.
       - A value of 1 provides lower quality but maintains the person's face more consistently and reduces frame flickering.
       - Using a value below 0.5 is not advised. Adjust this setting to achieve optimal results. Starting with a value of 0.75 is recommended.
     - **GFPGAN**: Usually better quality.
-14. **Volume Amplifier**: Not amplify the volume of the output audio but allows you to amplify the volume of the audio when sending it to Wav2Lip. This allows you to better control on lips movement.
 ## KEYFRAMES MANAGER
 ![Illustration](demo/keyframes-manager.png)
@@ -319,7 +349,7 @@ For each segment of the translated text, you can :
  - Delete the segment by click on the trash button.
  - Add a new segment under this one by click on the arrow down button.
-## 📺 Examples
 https://user-images.githubusercontent.com/800903/262439441-bb9d888a-d33e-4246-9f0a-1ddeac062d35.mp4
@@ -329,7 +359,7 @@ https://user-images.githubusercontent.com/800903/262449305-901086a3-22cb-42d2-b5
 https://user-images.githubusercontent.com/800903/267808494-300f8cc3-9136-4810-86e2-92f2114a5f9a.mp4
-## 📖 Behind the scenes
 This extension operates in several stages to improve the quality of Wav2Lip-generated videos:
@@ -339,7 +369,7 @@ This extension operates in several stages to improve the quality of Wav2Lip-gene
 4. **Mask Creation**: The script creates a mask around the mouth and tries to keep other facial motions like those of the cheeks and chin.
 5. **Video Generation**: The script then takes the high-quality mouth image and overlays it onto the original image guided by the mouth mask.
-## 💪 Quality tips
 - Use a high quality video as input
 - Use a video with a consistent frame rate. Occasionally, videos may exhibit unusual playback frame rates (not the standard 24, 25, 30, 60), which can lead to issues with the face mask.
 - Use a high quality audio file as input, without background noise or music. Clean audio with a tool like [https://podcast.adobe.com/enhance](https://podcast.adobe.com/enhance).
@@ -347,12 +377,12 @@ This extension operates in several stages to improve the quality of Wav2Lip-gene
 - Mask Blur maximum twice the value of Mouth Mask Dilate. If you want to increase the blur, increase the value of Mouth Mask Dilate otherwise the mouth will be blurred and the underlying mouth could be visible.
 - Upscaling can be good for improving result, particularly around the mouth area. However, it will extend the processing duration. Use this tutorial from Olivio Sarikas to upscale your video: [https://www.youtube.com/watch?v=3z4MKUqFEUk](https://www.youtube.com/watch?v=3z4MKUqFEUk). Ensure the denoising strength is set between 0.0 and 0.05, select the 'revAnimated' model, and use the batch mode. i'll create a tutorial for this soon.
-## ⚠ Noted Constraints
 - for speed up process try to keep resolution under 1000x1000px and upscaling after process.
 - If the initial phase is excessively lengthy, consider using the "resize factor" to decrease the video's dimensions.
 - While there's no strict size limit for videos, larger videos will require more processing time. It's advisable to employ the "resize factor" to minimize the video size and then upscale the video once processing is complete.
-## know issues:
 If you have issues to install insightface, follow this step:
 - Download [insightface precompiled](https://github.com/Gourieff/Assets/raw/main/Insightface/insightface-0.7.3-cp310-cp310-win_amd64.whl) and paste it in the root folder of Wav2lip-studio
 - in terminal go to wav2lip-studio folder and type the following commands:
@@ -363,25 +393,25 @@ python -m pip install insightface-0.7.3-cp310-cp310-win_amd64.whl
 ```
 Enjoy
-## 📝 To do
 - ✔️ Standalone version
 - ✔️ Add a way to use a face swap image
 - ✔️ Add Possibility to use a video for audio input
 - ✔️ Convert avi to mp4. Avi is not show in video input but process work fine
 - [ ] ComfyUI intergration
-## 😎 Contributing
 We welcome contributions to this project. When submitting pull requests, please provide a detailed description of the changes. see [CONTRIBUTING](CONTRIBUTING.md) for more information.
-## 🙏 Appreciation
 - [Wav2Lip](https://github.com/Rudrabha/Wav2Lip)
 - [CodeFormer](https://github.com/sczhou/CodeFormer)
 - [Coqui TTS](https://github.com/coqui-ai/TTS)
 - [facefusion](https://github.com/facefusion/facefusion)
 - [Vocal Remover](https://github.com/tsurumeso/vocal-remover)
-## ☕ Support Wav2lip Studio
 this project is open-source effort that is free to use and modify. I rely on the support of users to keep this project going and help improve it. If you'd like to support me, you can make a donation on my Patreon page. Any contribution, large or small, is greatly appreciated!
@@ -389,7 +419,7 @@ Your support helps me cover the costs of development and maintenance, and allows
 [patreon page](https://www.patreon.com/Wav2LipStudio)
-## 📝 Citation
 If you use this project in your own work, in articles, tutorials, or presentations, we encourage you to cite this project to acknowledge the efforts put into it.
 To cite this project, please use the following BibTeX format:
@@ -405,5 +435,5 @@ To cite this project, please use the following BibTeX format:
 }
 ```
-## 📜 License
 * The code in this repository is released under the MIT license as found in the [LICENSE file](LICENSE).

 * [☕ Support Wav2lip Studio](#-support-wav2lip-studio)
 ## 🚀 Updates
+**2024.10.13 Add avatar for driving video**
+- 💪 Add 10 new avatars for driving video, you can now choose an avatar before generate the driving video.
+- 📺 Add a feature to close or not the mouth before generating lip sync video.
+- 🐛 Easy docker installation, follow instructions bellow.
+- ♻  Better macos integration, follow instructions bellow.
+- 🚀 In Comfyui panel, you can now regenerate mask and keyframe after modification of your video, allow better mouth mask.
+**2024.09.03 ComfyUI Integration in Lip Sync Studio**
+- 💪Manage and chain your comfyui worklows from end to end.
+**2024.08.07 Major Update (Standalone version only)**
+- 📺"Add Driving video feature": this feature allows you to generate a driving video to generate better lip sync.
+**2024.05.06 Major Update (Standalone version only)**
+- 🐛"Data Structure": I had to restructure the files to allow for better quality in the video output. The previous version did everything in RAM at the expense of video quality; each pass degraded the videos, for example, if you did a face swap + Wav2Lip, there was a degradation of quality because of creating a first pass for Wav2Lip and a second for face swap. You will now find a "data" directory in each project containing all the files necessary for the tool's work and maintaining quality through different passes (quality above all).
+- ♻"Wav2Lip Video Outputs": After generating Wav2Lip videos, the videos are numbered in the output directory. Clicking on "video quality" loads the last video of the specified quality.
+- 👄"Zero Mouth": this feature should allow closing a person's mouth before proceeding with lip-syncing, sometimes it doesn't have much effect or can add some flickering to the image, but I have had good results in some cases. Technically, this will take two passes to close the mouth, you will find the frames used by the tool in "data\zero."
+- 👬"Clone Voice": the interface has been revised.
+- 💪"High Quality Vs Best Quality": In this version, there is not much difference between High and Best. Best is to be used with videos where faces are large on the screen like on a 4K video for example. The process behind just uses different GFPGAN models and a different face alignment.
+- ▶ "Show Frame Number": In Low Quality only, the frame number appears in the top left corner. This helps to identify the frame where you want to make modifications.
+- 📺"Show Wav2Lip Output": this feature allows you to see the Wav2Lip output taking into account the input audio.
+- "New Face Alignment": The face alignment has been reviewed.
+- 🔑"Zoom In, Zoom Out, Move Right,...": Now you will understand why sometimes the results are strange and generate deformed lips, broken teeth, or other very strange things.I recommend the video tutorial here: https://www.patreon.com/posts/key-feature-103716855
 **2024.02.09 Spped Up Update (Standalone version only)**
 - 👬 Clone voice: Add controls to manage the voice clone (See Usage section)
 - 🎏 translate video: Add features to translate panel to manage translation (See Usage section)
 - 📺 Add Trim feature: Add a feature to trim the video.
 - 🔑 Automatic mask: Add a feature to automatically calculate the mask parameters (padding, dilate...). You can change parameters if needed.
 - 🚀 Speed up processes : All processes are now faster, Analysis, Face Swap, Generation in High quality
+- 💪 Less disk space used : Remove temporary files after generation and keep only necessary data, will greatly reduce disk space used.
 **2024.01.20 Major Update (Standalone version only)**
 - ♻ Manage project: Add a feature to manage multiple project
 - 📰 Control debug
 - 🐛 Fix resize factor bug
+# 💻 Installation
+## 🔗 Requirements (windows, linux, macos)
+1. FFmpeg : download it from the [official FFmpeg site](https://ffmpeg.org/download.html). Follow the instructions appropriate for your operating system, note ffmpeg have to be accessible from the command line.
+    - Make sure ffmpeg is in your PATH environment variable. If not, add it to your PATH environment variable.
+2. pyannote.audio:You need to agree to share your contact information to access pyannote models.
 To do so, go to both link:
     - [pyannote diarization-3.1 huggingface repository](https://huggingface.co/pyannote/speaker-diarization-3.1)
     - [pyannote segmentation-3.0 huggingface repository](https://huggingface.co/pyannote/segmentation-3.0)
      }
      ```
+3. Install [python 3.10.11](https://www.python.org/downloads/release/python-31011/) (for mac users follow instructions bellow)
+4. Install [git](https://git-scm.com/downloads)
+5. Check ffmpeg, python, cuda and git installation
     ```bash
     python --version
     git --version
     Cuda compilation tools, release 11.8, V11.8.89
     Build cuda_11.8.r11.8/compiler.31833905_0
     ```
+## Linux Users
+1. make sure to have git-lfs installed
+    ```bash
+    sudo apt-get install git-lfs
+    ```
+## Windows Users
 1. Install [Cuda 11.8](https://developer.nvidia.com/cuda-11-8-0-download-archive) if not ever done.
   ![Illustration](demo/cuda.png)
 2. Install [Visual Studio](https://visualstudio.microsoft.com/fr/downloads/). During the install, make sure to include the Python and C++ packages in visual studio installer.
     ```
 4. double click on wav2lip-studio.bat, that will install the requirements and download the models
+## MACOS Users
+1. Install python 3.9
    ```
    brew update
    brew install [email protected]
    git-lfs install
    xcode-select --install
    ```
+2. unzip the wav2lip-studio.zip in a folder
    ```
+   unzip wav2lip-studio.zip
    ```
+3. Install environnement and requirements
    ```
+   cd /YourWav2lipStudioFolder
+   /opt/homebrew/bin/python3.9 -m venv venv
    ./venv/bin/python3.9 -m pip install inaSpeechSegmenter
    ./venv/bin/python3.9 -m pip install tyro==0.8.5 pykalman==0.9.7
    ./venv/bin/python3.9 -m pip install TTS==0.21.2
    ./venv/bin/python3.9 -m pip install transformers==4.33.2
    ./venv/bin/python3.9 -m pip install numpy==1.24.4
     ```
+    3.1. for silicon, one more step is needed
+    ```
+    ./venv/bin/python3.9 -m pip uninstall torch torchvision torchaudio
+    ./venv/bin/python3.9 -m pip install --pre torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/nightly/cpu
+    sed -i '' 's/from torchvision.transforms.functional_tensor import rgb_to_grayscale/from torchvision.transforms.functional import rgb_to_grayscale/' venv/lib/python3.9/site-packages/basicsr/data/degradations.py
+    ```
+4. Install models
    ```
    git clone https://huggingface.co/numz/wav2lip_studio-0.2 models
    git clone https://huggingface.co/KwaiVGI/LivePortrait models/pretrained_weights
    ```
+5. Launch UI
    ```
    mkdir projects
    ./venv/bin/python3.9 wav2lip_studio.py
    ```
+# Tutorial
 - [FR version](https://youtu.be/43Q8YASkcUA)
 - [EN Version](https://youtu.be/B84A5alpPDc)
+# 🐍 Usage
 ##PARAMETERS
 1. Enter project name and click enter.
 2. Choose a video (avi or mp4 format). Note avi file will not appear in Video input but process will works.
                 ]
                 ```
       3. Input Video: Allow to use audio from the input video, voices cloning and translation. see [Input Video](#input-video) section for more details.
+11. **Driving Video**: Choose an avatar to generate a driving video.
+    - **Avatars**: Choose between 10 avatars to use for the driving video, each one will give a different driving result on lipsync output video.
+    - **Close Mouth**: Close the mouth of the avatar before generating the driving video.
+    - **Generate Driving Video**: Generate the driving video.
+12. **Video Quality**:
     - **Low**: Original Wav2Lip quality, fast but not very good.
     - **Medium**: Better quality by apply post processing on the mouth, slower.
     - **High**: Better quality by apply post processing and upscale the mouth quality, slower.
+13. **Wav2lip Checkpoint**: Choose beetwen 2 wav2lip model:
     - **Wav2lip**: Original Wav2Lip model, fast but not very good.
     - **Wav2lip GAN**: Better quality by apply post processing on the mouth, slower.
+14. **Face Restoration Model**: Choose beetwen 2 face restoration model:
     - **Code Former**:
       - A value of 0 offers higher quality but may significantly alter the person's facial appearance and cause noticeable flickering between frames.
       - A value of 1 provides lower quality but maintains the person's face more consistently and reduces frame flickering.
       - Using a value below 0.5 is not advised. Adjust this setting to achieve optimal results. Starting with a value of 0.75 is recommended.
     - **GFPGAN**: Usually better quality.
+15. **Volume Amplifier**: Not amplify the volume of the output audio but allows you to amplify the volume of the audio when sending it to Wav2Lip. This allows you to better control on lips movement.
 ## KEYFRAMES MANAGER
 ![Illustration](demo/keyframes-manager.png)
  - Delete the segment by click on the trash button.
  - Add a new segment under this one by click on the arrow down button.
+# 📺 Examples
 https://user-images.githubusercontent.com/800903/262439441-bb9d888a-d33e-4246-9f0a-1ddeac062d35.mp4
 https://user-images.githubusercontent.com/800903/267808494-300f8cc3-9136-4810-86e2-92f2114a5f9a.mp4
+# 📖 Behind the scenes
 This extension operates in several stages to improve the quality of Wav2Lip-generated videos:
 4. **Mask Creation**: The script creates a mask around the mouth and tries to keep other facial motions like those of the cheeks and chin.
 5. **Video Generation**: The script then takes the high-quality mouth image and overlays it onto the original image guided by the mouth mask.
+# 💪 Quality tips
 - Use a high quality video as input
 - Use a video with a consistent frame rate. Occasionally, videos may exhibit unusual playback frame rates (not the standard 24, 25, 30, 60), which can lead to issues with the face mask.
 - Use a high quality audio file as input, without background noise or music. Clean audio with a tool like [https://podcast.adobe.com/enhance](https://podcast.adobe.com/enhance).
 - Mask Blur maximum twice the value of Mouth Mask Dilate. If you want to increase the blur, increase the value of Mouth Mask Dilate otherwise the mouth will be blurred and the underlying mouth could be visible.
 - Upscaling can be good for improving result, particularly around the mouth area. However, it will extend the processing duration. Use this tutorial from Olivio Sarikas to upscale your video: [https://www.youtube.com/watch?v=3z4MKUqFEUk](https://www.youtube.com/watch?v=3z4MKUqFEUk). Ensure the denoising strength is set between 0.0 and 0.05, select the 'revAnimated' model, and use the batch mode. i'll create a tutorial for this soon.
+# ⚠ Noted Constraints
 - for speed up process try to keep resolution under 1000x1000px and upscaling after process.
 - If the initial phase is excessively lengthy, consider using the "resize factor" to decrease the video's dimensions.
 - While there's no strict size limit for videos, larger videos will require more processing time. It's advisable to employ the "resize factor" to minimize the video size and then upscale the video once processing is complete.
+# know issues:
 If you have issues to install insightface, follow this step:
 - Download [insightface precompiled](https://github.com/Gourieff/Assets/raw/main/Insightface/insightface-0.7.3-cp310-cp310-win_amd64.whl) and paste it in the root folder of Wav2lip-studio
 - in terminal go to wav2lip-studio folder and type the following commands:
 ```
 Enjoy
+# 📝 To do
 - ✔️ Standalone version
 - ✔️ Add a way to use a face swap image
 - ✔️ Add Possibility to use a video for audio input
 - ✔️ Convert avi to mp4. Avi is not show in video input but process work fine
 - [ ] ComfyUI intergration
+# 😎 Contributing
 We welcome contributions to this project. When submitting pull requests, please provide a detailed description of the changes. see [CONTRIBUTING](CONTRIBUTING.md) for more information.
+# 🙏 Appreciation
 - [Wav2Lip](https://github.com/Rudrabha/Wav2Lip)
 - [CodeFormer](https://github.com/sczhou/CodeFormer)
 - [Coqui TTS](https://github.com/coqui-ai/TTS)
 - [facefusion](https://github.com/facefusion/facefusion)
 - [Vocal Remover](https://github.com/tsurumeso/vocal-remover)
+# ☕ Support Wav2lip Studio
 this project is open-source effort that is free to use and modify. I rely on the support of users to keep this project going and help improve it. If you'd like to support me, you can make a donation on my Patreon page. Any contribution, large or small, is greatly appreciated!
 [patreon page](https://www.patreon.com/Wav2LipStudio)
+# 📝 Citation
 If you use this project in your own work, in articles, tutorials, or presentations, we encourage you to cite this project to acknowledge the efforts put into it.
 To cite this project, please use the following BibTeX format:
 }
 ```
+# 📜 License
 * The code in this repository is released under the MIT license as found in the [LICENSE file](LICENSE).