Spaces:

udayl
/

NotebookLM-Kokoro_TTS_App

Running

App Files Files Community

NotebookLM-Kokoro_TTS_App / README.md

udayl

updated Dockerfile

bfc166e about 2 months ago

preview code

raw

history blame contribute delete

2.04 kB

	---
	title: NotebookLM-Kokoro TTS Project
	sdk: docker
	app_file: gradio_app.py
	pinned: true
	---

	# NotebookLM-Kokoro TTS Project

	This project uses [Kokoro](https://huggingface.co/hexgrad/Kokoro-82M) – a lightweight, open-weight TTS model with 82 million parameters – to create a Google NotebookLM style Text-to-Speech application.

	## Why Kokoro?

	- Non-Proprietary & Open-Source: Kokoro is best in its class as a non-proprietary model, giving you full flexibility to deploy in production environments or personal projects.
	- High Efficiency: Despite its lightweight architecture, Kokoro delivers comparable quality to larger models while being faster and more cost-efficient.
	- Benchmarks: According to benchmarks available on the [TTS-Arena](https://huggingface.co/spaces/TTS-AGI/TTS-Arena) page, Kokoro outperforms many closed-source models, making it the ideal choice for open deployments.
	- Easy Integration: With simple pip and Homebrew installation for dependencies like espeak-ng, integration into Python projects is straightforward.

	## Setup Instructions

	### Environment Setup

	This project uses the uv Python package manager. Follow these steps:

	1. Install uv:

	```bash
	pip install uv
	```

	2. Create a new environment named `notebooklm`:

	```bash
	uv venv
	```

	3. Activate the environment:

	```bash
	source .venv/bin/activate
	```

	4. Install Python dependencies:

	```bash
	pip install "kokoro>=0.9.2" soundfile torch
	```

	5. Install espeak-ng (Mac users):

	```bash
	brew install espeak-ng
	```

	### Running the Application

	Once the environment is set up, run the main TTS script as follows:

	```bash
	python notebook_lm_kokoro.py
	```

	This will process the transcript text using Kokoro and output audio segments as WAV files.

	## Conclusion

	Kokoro’s combination of efficiency, quality, and open-access makes it the best non-proprietary TTS model available, as confirmed by recent benchmarks. Enjoy exploring and extending this project!