File size: 2,036 Bytes
f5dd475
97c565c
bfc166e
f5dd475
97c565c
f5dd475
 
3ac5c08
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
21dc8b5
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
---
title: NotebookLM-Kokoro TTS Project
sdk: docker
app_file: gradio_app.py
pinned: true
---

# NotebookLM-Kokoro TTS Project

This project uses [Kokoro](https://huggingface.co/hexgrad/Kokoro-82M) – a lightweight, open-weight TTS model with 82 million parameters – to create a Google NotebookLM style Text-to-Speech application.

## Why Kokoro?

- **Non-Proprietary & Open-Source:** Kokoro is best in its class as a non-proprietary model, giving you full flexibility to deploy in production environments or personal projects.
- **High Efficiency:** Despite its lightweight architecture, Kokoro delivers comparable quality to larger models while being faster and more cost-efficient.
- **Benchmarks:** According to benchmarks available on the [TTS-Arena](https://huggingface.co/spaces/TTS-AGI/TTS-Arena) page, Kokoro outperforms many closed-source models, making it the ideal choice for open deployments.
- **Easy Integration:** With simple pip and Homebrew installation for dependencies like espeak-ng, integration into Python projects is straightforward.

## Setup Instructions

### Environment Setup

This project uses the **uv** Python package manager. Follow these steps:

1. **Install uv:**

   ```bash
   pip install uv
   ```

2. **Create a new environment named `notebooklm`:**

   ```bash
   uv venv
   ```

3. **Activate the environment:**

   ```bash
   source .venv/bin/activate
   ```

4. **Install Python dependencies:**

   ```bash
   pip install "kokoro>=0.9.2" soundfile torch
   ```

5. **Install espeak-ng (Mac users):**

   ```bash
   brew install espeak-ng
   ```

### Running the Application

Once the environment is set up, run the main TTS script as follows:

```bash
python notebook_lm_kokoro.py
```

This will process the transcript text using Kokoro and output audio segments as WAV files.

## Conclusion

Kokoro’s combination of efficiency, quality, and open-access makes it the best non-proprietary TTS model available, as confirmed by recent benchmarks. Enjoy exploring and extending this project!