DreamVoice / README.md

freevc plugin

10e7ec2 10 months ago

5.99 kB

	---
	language:
	- en
	tags:
	- myshell
	- speech-to-speech
	---
	<!-- might put a [width=2000 * height=xxx] img here, this size best fits git page
	<img src="resources\cover.png"> -->
	<img src="resources/dreamvoice.png">

	# DreamVoice: Text-guided Voice Conversion

	--------------------

	## Introduction

	DreamVoice is an innovative approach to voice conversion (VC) that leverages text-guided generation to create personalized and versatile voice experiences.
	Unlike traditional VC methods, which require a target recording during inference, DreamVoice introduces a more intuitive solution by allowing users to specify desired voice timbres through text prompts.

	For more details, please check our interspeech paper: [DreamVoice](https://arxiv.org/abs/2406.16314)

	To listen to demos and download dataset, please check dreamvoice's homepage: [Homepage](https://haidog-yaqub.github.io/dreamvoice_demo/)


	# How to Use

	To load the models, you need to install packages:

	```
	pip install -r requirements.txt
	```

	Then you can use the model with the following code:

	- DreamVoice Plugin for FreeVC (DreamVG + [FreeVC](https://github.com/OlaWod/FreeVC))

	```python
	import torch
	import librosa
	import soundfile as sf
	from dreamvoice import DreamVoice_Plugin
	from dreamvoice.freevc_wrapper import get_freevc_models, convert

	device = 'cuda'
	freevc, cmodel, hps = get_freevc_models('ckpts_freevc/', 'dreamvoice/', device)

	# init dreamvoice
	dreamvoice = DreamVoice_Plugin(config='plugin_freevc.yaml', device=device)

	# generate speaker
	prompt = "old female's voice, deep and dark"
	target_se = dreamvoice.gen_spk(prompt)

	# content source
	source_path = 'examples/test1.wav'
	audio_clip = librosa.load(source_path, sr=16000)[0]
	audio_clip = torch.tensor(audio_clip).unsqueeze(0).to(device)
	content = cmodel(audio_clip).last_hidden_state.transpose(1, 2).to(device)

	# voice conversion
	output, out_sr = convert(freevc, content, target_se)
	sf.write('output.wav', output, out_sr)
	```

	- DreamVoice Plugin for OpenVoice (DreamVG + [OpneVoice](https://github.com/myshell-ai/OpenVoice))

	```python
	import torch
	from dreamvoice import DreamVoice_Plugin
	from dreamvoice.openvoice_utils import se_extractor
	from openvoice.api import ToneColorConverter

	# init dreamvoice
	dreamvoice = DreamVoice_Plugin(device='cuda')

	# init openvoice
	ckpt_converter = 'checkpoints_v2/converter'
	openvoice = ToneColorConverter(f'{ckpt_converter}/config.json', device='cuda')
	openvoice.load_ckpt(f'{ckpt_converter}/checkpoint.pth')

	# generate speaker
	prompt = 'young female voice, sounds young and cute'
	target_se = dreamvoice.gen_spk(prompt)
	target_se = target_se.unsqueeze(-1)

	# content source
	source_path = 'examples/test2.wav'
	source_se = se_extractor(source_path, openvoice).to(device)

	# voice conversion
	encode_message = "@MyShell"
	openvoice.convert(
	audio_src_path=source_path,
	src_se=source_se,
	tgt_se=target_se,
	output_path='output.wav',
	message=encode_message)
	```

	- DreamVoice Plugin for DiffVC (Diffusion-based VC Model)

	```python
	from dreamvoice import DreamVoice

	# Initialize DreamVoice in plugin mode with CUDA device
	dreamvoice = DreamVoice(mode='plugin', device='cuda')
	# Description of the target voice
	prompt = 'young female voice, sounds young and cute'
	# Provide the path to the content audio and generate the converted audio
	gen_audio, sr = dreamvoice.genvc('examples/test1.wav', prompt)
	# Save the converted audio
	dreamvoice.save_audio('gen1.wav', gen_audio, sr)

	# Save the speaker embedding if you like the generated voice
	dreamvoice.save_spk_embed('voice_stash1.pt')
	# Load the saved speaker embedding
	dreamvoice.load_spk_embed('voice_stash1.pt')
	# Use the saved speaker embedding for another audio sample
	gen_audio2, sr = dreamvoice.simplevc('examples/test2.wav', use_spk_cache=True)
	dreamvoice.save_audio('gen2.wav', gen_audio2, sr)
	```

	# Training Guide

	1. download VCTK and LibriTTS-R
	2. download [DreamVoice DataSet](https://haidog-yaqub.github.io/dreamvoice_demo/)
	3. extract speaker embeddings and cache in local path:
	```
	python dreamvoice/train_utils/prepare/prepare_se.py
	```
	4. modify trainning config and train your dreamvoice plugin:
	```
	cd dreamvoice/train_utils/src
	accelerate launch train.py
	```


	# Extra Features

	- End-to-end DreamVoice VC Model

	```python
	from dreamvoice import DreamVoice

	# Initialize DreamVoice in end-to-end mode with CUDA device
	dreamvoice = DreamVoice(mode='end2end', device='cuda')
	# Provide the path to the content audio and generate the converted audio
	gen_end2end, sr = dreamvoice.genvc('examples/test1.wav', prompt)
	# Save the converted audio
	dreamvoice.save_audio('gen_end2end.wav', gen_end2end, sr)

	# Note: End-to-end mode does not support saving speaker embeddings
	# To use a voice generated in end-to-end mode, switch back to plugin mode
	# and extract the speaker embedding from the generated audio
	# Switch back to plugin mode
	dreamvoice = DreamVoice(mode='plugin', device='cuda')
	# Load the speaker audio from the previously generated file
	gen_end2end2, sr = dreamvoice.simplevc('examples/test2.wav', speaker_audio='gen_end2end.wav')
	# Save the new converted audio
	dreamvoice.save_audio('gen_end2end2.wav', gen_end2end2, sr)
	```

	- DiffVC (Diffusion-based VC Model)

	```python
	from dreamvoice import DreamVoice

	# Plugin mode can be used for traditional one-shot voice conversion
	dreamvoice = DreamVoice(mode='plugin', device='cuda')
	# Generate audio using traditional one-shot voice conversion
	gen_tradition, sr = dreamvoice.simplevc('examples/test1.wav', speaker_audio='examples/speaker.wav')
	# Save the converted audio
	dreamvoice.save_audio('gen_tradition.wav', gen_tradition, sr)
	```

	## Reference

	If you find the code useful for your research, please consider citing:

	```bibtex
	@article{hai2024dreamvoice,
	title={DreamVoice: Text-Guided Voice Conversion},
	author={Hai, Jiarui and Thakkar, Karan and Wang, Helin and Qin, Zengyi and Elhilali, Mounya},
	journal={arXiv preprint arXiv:2406.16314},
	year={2024}
	}
	```