# Introduction
Discord QQ Docker
!!! warning We assume no responsibility for any illegal use of the codebase. Please refer to the local laws regarding DMCA (Digital Millennium Copyright Act) and other relevant laws in your area.
This codebase is released under the `BSD-3-Clause` license, and all models are released under the CC-BY-NC-SA-4.0 license.

## Requirements - GPU Memory: 4GB (for inference), 8GB (for fine-tuning) - System: Linux, Windows ## Windows Setup Windows professional users may consider WSL2 or Docker to run the codebase. Non-professional Windows users can consider the following methods to run the codebase without a Linux environment (with model compilation capabilities aka `torch.compile`):
  1. Unzip the project package.
  2. Click install_env.bat to install the environment.
  3. If step 2 has USE_MIRROR=preview, execute this step (optional, for activating the compiled model environment):
    1. Download the LLVM compiler using the following links:
    2. Download and install the Microsoft Visual C++ Redistributable package to resolve potential .dll missing issues.
    3. Download and install Visual Studio Community Edition to obtain MSVC++ build tools, resolving LLVM header file dependencies.
      • Visual Studio Download
      • After installing Visual Studio Installer, download Visual Studio Community 2022.
      • Click the Modify button as shown below, find the Desktop development with C++ option, and check it for download.
    4. Install CUDA Toolkit 12
  4. Double-click start.bat to enter the Fish-Speech training inference configuration WebUI page.
  5. (Optional) Double-click run_cmd.bat to enter the conda/python command line environment of this project.
## Linux Setup ```bash # Create a python 3.10 virtual environment, you can also use virtualenv conda create -n fish-speech python=3.10 conda activate fish-speech # Install pytorch pip3 install torch torchvision torchaudio # Install fish-speech pip3 install -e . # (Ubuntu / Debian User) Install sox apt install libsox-dev ``` ## Changelog - 2024/07/02: Updated Fish-Speech to 1.2 version, remove VITS Decoder, and greatly enhanced zero-shot ability. - 2024/05/10: Updated Fish-Speech to 1.1 version, implement VITS decoder to reduce WER and improve timbre similarity. - 2024/04/22: Finished Fish-Speech 1.0 version, significantly modified VQGAN and LLAMA models. - 2023/12/28: Added `lora` fine-tuning support. - 2023/12/27: Add `gradient checkpointing`, `causual sampling`, and `flash-attn` support. - 2023/12/19: Updated webui and HTTP API. - 2023/12/18: Updated fine-tuning documentation and related examples. - 2023/12/17: Updated `text2semantic` model, supporting phoneme-free mode. - 2023/12/13: Beta version released, includes VQGAN model and a language model based on LLAMA (phoneme support only). ## Acknowledgements - [VITS2 (daniilrobnikov)](https://github.com/daniilrobnikov/vits2) - [Bert-VITS2](https://github.com/fishaudio/Bert-VITS2) - [GPT VITS](https://github.com/innnky/gpt-vits) - [MQTTS](https://github.com/b04901014/MQTTS) - [GPT Fast](https://github.com/pytorch-labs/gpt-fast) - [Transformers](https://github.com/huggingface/transformers) - [GPT-SoVITS](https://github.com/RVC-Boss/GPT-SoVITS)