--- license: mit base_model: - hexgrad/Kokoro-82M --- # **VocRT** This repository contains the complete codebase for building your personal Realtime Voice-to-Voice (V2V) solution. It integrates a powerful TTS model, gRPC communication, an Express server, and a React-based client. Follow this guide to set up and explore the system effectively. --- ## **Repository Structure** ``` ├── backend/ # Express server for handling API requests ├── frontend/ # React client for user interaction ├── .env # Environment variables (OpenAI API key, etc.) ├── voices # All available voices ├── demo # Contains sample audio and demo files ├── other... ``` --- ## **Docker** 🐳 VocRT on Docker Hub: https://hub.docker.com/r/anuragsingh922/vocrt ## **Repository** ## **Setup Guide** ### **Step 1: Clone the Repository** Clone this repository to your local machine: ```bash git clone https://huggingface.co/anuragsingh922/VocRT cd VocRT ``` --- ### **Step 2: Python Virtual Environment Setup** Create a virtual environment to manage dependencies: #### macOS/Linux: ```bash python3 -m venv venv source venv/bin/activate ``` #### Windows: ```bash python -m venv venv venv\Scripts\activate ``` --- ### **Step 3: Install Python Dependencies** With the virtual environment activated, install the required dependencies: ```bash pip install --upgrade pip setuptools wheel pip install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cpu pip install phonemizer transformers scipy munch python-dotenv openai grpcio grpcio-tools ``` ### **Installing eSpeak** `eSpeak` is a necessary dependency for the VocRT system. Follow the instructions below to install it on your platform: #### **Ubuntu/Linux** Use the `apt-get` package manager to install `eSpeak`: ```bash sudo apt-get update sudo apt-get install espeak ``` #### **macOS** Install `eSpeak` using [Homebrew](https://brew.sh/): 1. Ensure Homebrew is installed on your system: ```bash /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)" ``` 2. Install `espeak`: ```bash brew install espeak ``` #### **Windows** For Windows, follow these steps to install `eSpeak`: 1. Download the eSpeak installer from the official website: [eSpeak Downloads](http://espeak.sourceforge.net/download.html). 2. Run the installer and follow the on-screen instructions to complete the installation. 3. Add the `eSpeak` installation path to your system's `PATH` environment variable: - Open **System Properties** → **Advanced** → **Environment Variables**. - In the "System Variables" section, find the `Path` variable and edit it. - Add the path to the `espeak.exe` file (e.g., `C:\Program Files (x86)\eSpeak`). 4. Verify the installation: Open Command Prompt and run: ```cmd espeak --version ``` --- ### **Verification** After installing `eSpeak`, verify it is correctly set up by running: ```bash espeak "Hello, world!" ``` This should output "Hello, world!" as audio on your system. --- ### **Step 4: Backend Setup (Express Server)** 1. Navigate to the `backend` directory: ```bash cd backend ``` 2. Install Node.js dependencies: ```bash npm install ``` 3. Update the `config.env` file with your Deepgram API key: - Open `config.env` in a text editor. - Replace `` with your actual Deepgram API key. 4. Start the Express server: ```bash node app.js ``` --- ### **Step 5: Frontend Setup (React Client)** 1. Open a new terminal and navigate to the `frontend` directory: ```bash cd frontend ``` 2. Install client dependencies: ```bash npm install ``` 3. Start the client: ```bash npm start ``` --- ### **Step 6: Start the VocRT Server** 1. Add your OpenAI API key to the `.env` file: - Open `.env` in a text editor. - Replace `` with your actual OpenAI API key. 2. Start the VocRT server: ```bash python3 app.py ``` --- ### **Step 7: Test the Full System** - Once all servers are running: 1. Access the React client at [http://localhost:3000](http://localhost:3000). 2. Interact with the VocRT system via the web interface. --- ## **Model Used** VocRT uses [Kokoro-82M](https://huggingface.co/hexgrad/Kokoro-82M) for text-to-speech synthesis, processing user inputs into high-quality voice responses. --- ## **Key Features** 1. **Realtime voice response generation**: Convert speech input into speech with minimal latency. 2. **React Client**: A user-friendly frontend for interaction. 3. **Express Backend**: Handles API requests and integrates the VocRT system with external services. 4. **gRPC Communication**: Seamless communication between the VocRT server and other components. 5. **Configurable APIs**: Integrates with OpenAI and Deepgram APIs for speech recognition and text generation. --- ## **Dependencies** ### Python: - torch, torchvision, torchaudio - phonemizer - transformers - scipy - munch - python-dotenv - openai - grpcio, grpcio-tools - espeak ### Node.js: - Express server dependencies (`npm install` in `backend`). - React client dependencies (`npm install` in `frontend`). --- ## **Contributing** Contributions are welcome! Feel free to fork this repository and create a pull request with your improvements. --- ## **Acknowledgments** - [Hugging Face](https://huggingface.co/) for hosting the Kokoro-82M model. - The amazing communities behind PyTorch, OpenAI, and Deepgram APIs.