wr
commited on
Commit
·
0233e7e
1
Parent(s):
604eca0
add manifest and pretrained vocoders
Browse files- README.md +45 -0
- manifest/.DS_Store +0 -0
- manifest/arctic_bdl_parallel_wavegan.v1/.DS_Store +0 -0
- manifest/arctic_bdl_parallel_wavegan.v1/config.yml +104 -0
- manifest/arctic_bdl_parallel_wavegan.v1/pwg-arctic-bdl-400000steps.pkl +3 -0
- manifest/arctic_bdl_parallel_wavegan.v1/stats.npy +3 -0
- manifest/arctic_clb_parallel_wavegan.v1/.DS_Store +0 -0
- manifest/arctic_clb_parallel_wavegan.v1/config.yml +104 -0
- manifest/arctic_clb_parallel_wavegan.v1/pwg-arctic-clb-400000steps.pkl +3 -0
- manifest/arctic_clb_parallel_wavegan.v1/stats.npy +3 -0
- manifest/arctic_rms_parallel_wavegan.v1/.DS_Store +0 -0
- manifest/arctic_rms_parallel_wavegan.v1/config.yml +104 -0
- manifest/arctic_rms_parallel_wavegan.v1/pwg-arctic-rms-400000steps.pkl +3 -0
- manifest/arctic_rms_parallel_wavegan.v1/stats.npy +3 -0
- manifest/arctic_slt_parallel_wavegan.v1/.DS_Store +0 -0
- manifest/arctic_slt_parallel_wavegan.v1/config.yml +94 -0
- manifest/arctic_slt_parallel_wavegan.v1/pwg-arctic-slt-400000steps.pkl +3 -0
- manifest/arctic_slt_parallel_wavegan.v1/stats.npy +3 -0
- manifest/dict.txt +3 -0
- manifest/test.tsv +3 -0
- manifest/train.tsv +3 -0
- manifest/utils/cmu_arctic_manifest.py +90 -0
- manifest/utils/make_tsv.sh +10 -0
- manifest/utils/prep_cmu_arctic_spkemb.py +68 -0
- manifest/utils/spec2wav.sh +0 -0
- manifest/valid.tsv +3 -0
README.md
CHANGED
|
@@ -1,3 +1,48 @@
|
|
| 1 |
---
|
| 2 |
license: mit
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
| 2 |
license: mit
|
| 3 |
+
tags:
|
| 4 |
+
- speech
|
| 5 |
+
- text
|
| 6 |
+
- cross-modal
|
| 7 |
+
- unified model
|
| 8 |
+
- self-supervised learning
|
| 9 |
+
- SpeechT5
|
| 10 |
+
- Voice Conversion
|
| 11 |
+
datasets:
|
| 12 |
+
- CMU ARCTIC
|
| 13 |
+
- bdl
|
| 14 |
+
- clb
|
| 15 |
+
- rms
|
| 16 |
+
- slt
|
| 17 |
---
|
| 18 |
+
|
| 19 |
+
## SpeechT5 TTS Manifest
|
| 20 |
+
|
| 21 |
+
| [**Github**](https://github.com/microsoft/SpeechT5) | [**Huggingface**](https://huggingface.co/mechanicalsea/speecht5-vc) |
|
| 22 |
+
|
| 23 |
+
This manifest is an attempt to recreate the Voice Conversion recipe used for training [SpeechT5](https://aclanthology.org/2022.acl-long.393). This manifest was constructed using [CMU ARCTIC](http://www.festvox.org/cmu_arctic/) four speakers, e.g., bdl, clb, rms, slt. There are 932 utterances for training, 100 utterances for validation, and 100 utterance for evaluation.
|
| 24 |
+
|
| 25 |
+
### Requirements
|
| 26 |
+
|
| 27 |
+
- [SpeechBrain](https://github.com/speechbrain/speechbrain) for extracting speaker embedding
|
| 28 |
+
- [Parallel WaveGAN](https://github.com/kan-bayashi/ParallelWaveGAN) for implementing vocoder.
|
| 29 |
+
|
| 30 |
+
### Tools
|
| 31 |
+
|
| 32 |
+
- [manifest/utils](./manifest/utils/) is used to extract speaker embedding, generate manifest, and apply vocoder.
|
| 33 |
+
- [manifest/arctic*](./manifest/) provides the pre-trained vocoder for each speaker.
|
| 34 |
+
|
| 35 |
+
### Reference
|
| 36 |
+
|
| 37 |
+
If you find our work is useful in your research, please cite the following paper:
|
| 38 |
+
|
| 39 |
+
```bibtex
|
| 40 |
+
@inproceedings{ao-etal-2022-speecht5,
|
| 41 |
+
title = {{S}peech{T}5: Unified-Modal Encoder-Decoder Pre-Training for Spoken Language Processing},
|
| 42 |
+
author = {Ao, Junyi and Wang, Rui and Zhou, Long and Wang, Chengyi and Ren, Shuo and Wu, Yu and Liu, Shujie and Ko, Tom and Li, Qing and Zhang, Yu and Wei, Zhihua and Qian, Yao and Li, Jinyu and Wei, Furu},
|
| 43 |
+
booktitle = {Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},
|
| 44 |
+
month = {May},
|
| 45 |
+
year = {2022},
|
| 46 |
+
pages={5723--5738},
|
| 47 |
+
}
|
| 48 |
+
```
|
manifest/.DS_Store
ADDED
|
Binary file (8.2 kB). View file
|
|
|
manifest/arctic_bdl_parallel_wavegan.v1/.DS_Store
ADDED
|
Binary file (6.15 kB). View file
|
|
|
manifest/arctic_bdl_parallel_wavegan.v1/config.yml
ADDED
|
@@ -0,0 +1,104 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
allow_cache: true
|
| 2 |
+
batch_max_steps: 15360
|
| 3 |
+
batch_size: 10
|
| 4 |
+
config: conf/parallel_wavegan.v1.yaml
|
| 5 |
+
dev_dumpdir: dump/dev_bdl/norm
|
| 6 |
+
dev_feats_scp: null
|
| 7 |
+
dev_segments: null
|
| 8 |
+
dev_wav_scp: null
|
| 9 |
+
discriminator_grad_norm: 1
|
| 10 |
+
discriminator_optimizer_params:
|
| 11 |
+
eps: 1.0e-06
|
| 12 |
+
lr: 5.0e-05
|
| 13 |
+
weight_decay: 0.0
|
| 14 |
+
discriminator_params:
|
| 15 |
+
bias: true
|
| 16 |
+
conv_channels: 64
|
| 17 |
+
in_channels: 1
|
| 18 |
+
kernel_size: 3
|
| 19 |
+
layers: 10
|
| 20 |
+
nonlinear_activation: LeakyReLU
|
| 21 |
+
nonlinear_activation_params:
|
| 22 |
+
negative_slope: 0.2
|
| 23 |
+
out_channels: 1
|
| 24 |
+
use_weight_norm: true
|
| 25 |
+
discriminator_scheduler_params:
|
| 26 |
+
gamma: 0.5
|
| 27 |
+
step_size: 200000
|
| 28 |
+
discriminator_train_start_steps: 100000
|
| 29 |
+
distributed: false
|
| 30 |
+
eval_interval_steps: 1000
|
| 31 |
+
fft_size: 1024
|
| 32 |
+
fmax: 7600
|
| 33 |
+
fmin: 80
|
| 34 |
+
format: npy
|
| 35 |
+
generator_grad_norm: 10
|
| 36 |
+
generator_optimizer_params:
|
| 37 |
+
eps: 1.0e-06
|
| 38 |
+
lr: 0.0001
|
| 39 |
+
weight_decay: 0.0
|
| 40 |
+
generator_params:
|
| 41 |
+
aux_channels: 80
|
| 42 |
+
aux_context_window: 2
|
| 43 |
+
dropout: 0.0
|
| 44 |
+
gate_channels: 128
|
| 45 |
+
in_channels: 1
|
| 46 |
+
kernel_size: 3
|
| 47 |
+
layers: 30
|
| 48 |
+
out_channels: 1
|
| 49 |
+
residual_channels: 64
|
| 50 |
+
skip_channels: 64
|
| 51 |
+
stacks: 3
|
| 52 |
+
upsample_net: ConvInUpsampleNetwork
|
| 53 |
+
upsample_params:
|
| 54 |
+
upsample_scales:
|
| 55 |
+
- 4
|
| 56 |
+
- 4
|
| 57 |
+
- 4
|
| 58 |
+
- 4
|
| 59 |
+
use_weight_norm: true
|
| 60 |
+
generator_scheduler_params:
|
| 61 |
+
gamma: 0.5
|
| 62 |
+
step_size: 200000
|
| 63 |
+
global_gain_scale: 1.0
|
| 64 |
+
hop_size: 256
|
| 65 |
+
lambda_adv: 4.0
|
| 66 |
+
log_interval_steps: 100
|
| 67 |
+
num_mels: 80
|
| 68 |
+
num_save_intermediate_results: 4
|
| 69 |
+
num_workers: 2
|
| 70 |
+
outdir: exp/train_nodev_bdl_arctic_parallel_wavegan.v1
|
| 71 |
+
pin_memory: true
|
| 72 |
+
pretrain: ''
|
| 73 |
+
rank: 0
|
| 74 |
+
remove_short_samples: true
|
| 75 |
+
resume: /mnt/default/v-junyiao/vc_vocoder2/train_nodev_bdl_arctic_parallel_wavegan.v1/checkpoint-135000steps.pkl
|
| 76 |
+
sampling_rate: 16000
|
| 77 |
+
save_interval_steps: 5000
|
| 78 |
+
stft_loss_params:
|
| 79 |
+
fft_sizes:
|
| 80 |
+
- 1024
|
| 81 |
+
- 2048
|
| 82 |
+
- 512
|
| 83 |
+
hop_sizes:
|
| 84 |
+
- 120
|
| 85 |
+
- 240
|
| 86 |
+
- 50
|
| 87 |
+
win_lengths:
|
| 88 |
+
- 600
|
| 89 |
+
- 1200
|
| 90 |
+
- 240
|
| 91 |
+
window: hann_window
|
| 92 |
+
train_dumpdir: dump/train_nodev_bdl/norm
|
| 93 |
+
train_feats_scp: null
|
| 94 |
+
train_max_steps: 400000
|
| 95 |
+
train_segments: null
|
| 96 |
+
train_wav_scp: null
|
| 97 |
+
trim_frame_size: 2048
|
| 98 |
+
trim_hop_size: 512
|
| 99 |
+
trim_silence: false
|
| 100 |
+
trim_threshold_in_db: 60
|
| 101 |
+
verbose: 1
|
| 102 |
+
version: 0.4.8
|
| 103 |
+
win_length: null
|
| 104 |
+
window: hann
|
manifest/arctic_bdl_parallel_wavegan.v1/pwg-arctic-bdl-400000steps.pkl
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:f92557c6c61c2acc3a7f74533b291f03eae891963adee06d2e901922886c803c
|
| 3 |
+
size 5918653
|
manifest/arctic_bdl_parallel_wavegan.v1/stats.npy
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:7c186bca19c4ed7bc4d93dd7aacd3db9d8ca6186fd5d5e8d64b7b19cde03637c
|
| 3 |
+
size 768
|
manifest/arctic_clb_parallel_wavegan.v1/.DS_Store
ADDED
|
Binary file (6.15 kB). View file
|
|
|
manifest/arctic_clb_parallel_wavegan.v1/config.yml
ADDED
|
@@ -0,0 +1,104 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
allow_cache: true
|
| 2 |
+
batch_max_steps: 15360
|
| 3 |
+
batch_size: 10
|
| 4 |
+
config: conf/parallel_wavegan.v1.yaml
|
| 5 |
+
dev_dumpdir: dump/dev_clb/norm
|
| 6 |
+
dev_feats_scp: null
|
| 7 |
+
dev_segments: null
|
| 8 |
+
dev_wav_scp: null
|
| 9 |
+
discriminator_grad_norm: 1
|
| 10 |
+
discriminator_optimizer_params:
|
| 11 |
+
eps: 1.0e-06
|
| 12 |
+
lr: 5.0e-05
|
| 13 |
+
weight_decay: 0.0
|
| 14 |
+
discriminator_params:
|
| 15 |
+
bias: true
|
| 16 |
+
conv_channels: 64
|
| 17 |
+
in_channels: 1
|
| 18 |
+
kernel_size: 3
|
| 19 |
+
layers: 10
|
| 20 |
+
nonlinear_activation: LeakyReLU
|
| 21 |
+
nonlinear_activation_params:
|
| 22 |
+
negative_slope: 0.2
|
| 23 |
+
out_channels: 1
|
| 24 |
+
use_weight_norm: true
|
| 25 |
+
discriminator_scheduler_params:
|
| 26 |
+
gamma: 0.5
|
| 27 |
+
step_size: 200000
|
| 28 |
+
discriminator_train_start_steps: 100000
|
| 29 |
+
distributed: false
|
| 30 |
+
eval_interval_steps: 1000
|
| 31 |
+
fft_size: 1024
|
| 32 |
+
fmax: 7600
|
| 33 |
+
fmin: 80
|
| 34 |
+
format: npy
|
| 35 |
+
generator_grad_norm: 10
|
| 36 |
+
generator_optimizer_params:
|
| 37 |
+
eps: 1.0e-06
|
| 38 |
+
lr: 0.0001
|
| 39 |
+
weight_decay: 0.0
|
| 40 |
+
generator_params:
|
| 41 |
+
aux_channels: 80
|
| 42 |
+
aux_context_window: 2
|
| 43 |
+
dropout: 0.0
|
| 44 |
+
gate_channels: 128
|
| 45 |
+
in_channels: 1
|
| 46 |
+
kernel_size: 3
|
| 47 |
+
layers: 30
|
| 48 |
+
out_channels: 1
|
| 49 |
+
residual_channels: 64
|
| 50 |
+
skip_channels: 64
|
| 51 |
+
stacks: 3
|
| 52 |
+
upsample_net: ConvInUpsampleNetwork
|
| 53 |
+
upsample_params:
|
| 54 |
+
upsample_scales:
|
| 55 |
+
- 4
|
| 56 |
+
- 4
|
| 57 |
+
- 4
|
| 58 |
+
- 4
|
| 59 |
+
use_weight_norm: true
|
| 60 |
+
generator_scheduler_params:
|
| 61 |
+
gamma: 0.5
|
| 62 |
+
step_size: 200000
|
| 63 |
+
global_gain_scale: 1.0
|
| 64 |
+
hop_size: 256
|
| 65 |
+
lambda_adv: 4.0
|
| 66 |
+
log_interval_steps: 100
|
| 67 |
+
num_mels: 80
|
| 68 |
+
num_save_intermediate_results: 4
|
| 69 |
+
num_workers: 2
|
| 70 |
+
outdir: exp/train_nodev_clb_arctic_parallel_wavegan.v1
|
| 71 |
+
pin_memory: true
|
| 72 |
+
pretrain: ''
|
| 73 |
+
rank: 0
|
| 74 |
+
remove_short_samples: true
|
| 75 |
+
resume: /mnt/default/v-junyiao/vc_vocoder2/train_nodev_clb_arctic_parallel_wavegan.v1/checkpoint-135000steps.pkl
|
| 76 |
+
sampling_rate: 16000
|
| 77 |
+
save_interval_steps: 5000
|
| 78 |
+
stft_loss_params:
|
| 79 |
+
fft_sizes:
|
| 80 |
+
- 1024
|
| 81 |
+
- 2048
|
| 82 |
+
- 512
|
| 83 |
+
hop_sizes:
|
| 84 |
+
- 120
|
| 85 |
+
- 240
|
| 86 |
+
- 50
|
| 87 |
+
win_lengths:
|
| 88 |
+
- 600
|
| 89 |
+
- 1200
|
| 90 |
+
- 240
|
| 91 |
+
window: hann_window
|
| 92 |
+
train_dumpdir: dump/train_nodev_clb/norm
|
| 93 |
+
train_feats_scp: null
|
| 94 |
+
train_max_steps: 400000
|
| 95 |
+
train_segments: null
|
| 96 |
+
train_wav_scp: null
|
| 97 |
+
trim_frame_size: 2048
|
| 98 |
+
trim_hop_size: 512
|
| 99 |
+
trim_silence: false
|
| 100 |
+
trim_threshold_in_db: 60
|
| 101 |
+
verbose: 1
|
| 102 |
+
version: 0.4.8
|
| 103 |
+
win_length: null
|
| 104 |
+
window: hann
|
manifest/arctic_clb_parallel_wavegan.v1/pwg-arctic-clb-400000steps.pkl
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:e80e448926a2b5b38de076fa8cc9e38589712d95ed08705bc7f242910c15ec4e
|
| 3 |
+
size 5918653
|
manifest/arctic_clb_parallel_wavegan.v1/stats.npy
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:23ef7d65275668849dc7c5bb876d78b8e3657f5e1ca299b76eb3ca6ce9c2370e
|
| 3 |
+
size 768
|
manifest/arctic_rms_parallel_wavegan.v1/.DS_Store
ADDED
|
Binary file (6.15 kB). View file
|
|
|
manifest/arctic_rms_parallel_wavegan.v1/config.yml
ADDED
|
@@ -0,0 +1,104 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
allow_cache: true
|
| 2 |
+
batch_max_steps: 15360
|
| 3 |
+
batch_size: 10
|
| 4 |
+
config: conf/parallel_wavegan.v1.yaml
|
| 5 |
+
dev_dumpdir: dump/dev_rms/norm
|
| 6 |
+
dev_feats_scp: null
|
| 7 |
+
dev_segments: null
|
| 8 |
+
dev_wav_scp: null
|
| 9 |
+
discriminator_grad_norm: 1
|
| 10 |
+
discriminator_optimizer_params:
|
| 11 |
+
eps: 1.0e-06
|
| 12 |
+
lr: 5.0e-05
|
| 13 |
+
weight_decay: 0.0
|
| 14 |
+
discriminator_params:
|
| 15 |
+
bias: true
|
| 16 |
+
conv_channels: 64
|
| 17 |
+
in_channels: 1
|
| 18 |
+
kernel_size: 3
|
| 19 |
+
layers: 10
|
| 20 |
+
nonlinear_activation: LeakyReLU
|
| 21 |
+
nonlinear_activation_params:
|
| 22 |
+
negative_slope: 0.2
|
| 23 |
+
out_channels: 1
|
| 24 |
+
use_weight_norm: true
|
| 25 |
+
discriminator_scheduler_params:
|
| 26 |
+
gamma: 0.5
|
| 27 |
+
step_size: 200000
|
| 28 |
+
discriminator_train_start_steps: 100000
|
| 29 |
+
distributed: false
|
| 30 |
+
eval_interval_steps: 1000
|
| 31 |
+
fft_size: 1024
|
| 32 |
+
fmax: 7600
|
| 33 |
+
fmin: 80
|
| 34 |
+
format: npy
|
| 35 |
+
generator_grad_norm: 10
|
| 36 |
+
generator_optimizer_params:
|
| 37 |
+
eps: 1.0e-06
|
| 38 |
+
lr: 0.0001
|
| 39 |
+
weight_decay: 0.0
|
| 40 |
+
generator_params:
|
| 41 |
+
aux_channels: 80
|
| 42 |
+
aux_context_window: 2
|
| 43 |
+
dropout: 0.0
|
| 44 |
+
gate_channels: 128
|
| 45 |
+
in_channels: 1
|
| 46 |
+
kernel_size: 3
|
| 47 |
+
layers: 30
|
| 48 |
+
out_channels: 1
|
| 49 |
+
residual_channels: 64
|
| 50 |
+
skip_channels: 64
|
| 51 |
+
stacks: 3
|
| 52 |
+
upsample_net: ConvInUpsampleNetwork
|
| 53 |
+
upsample_params:
|
| 54 |
+
upsample_scales:
|
| 55 |
+
- 4
|
| 56 |
+
- 4
|
| 57 |
+
- 4
|
| 58 |
+
- 4
|
| 59 |
+
use_weight_norm: true
|
| 60 |
+
generator_scheduler_params:
|
| 61 |
+
gamma: 0.5
|
| 62 |
+
step_size: 200000
|
| 63 |
+
global_gain_scale: 1.0
|
| 64 |
+
hop_size: 256
|
| 65 |
+
lambda_adv: 4.0
|
| 66 |
+
log_interval_steps: 100
|
| 67 |
+
num_mels: 80
|
| 68 |
+
num_save_intermediate_results: 4
|
| 69 |
+
num_workers: 2
|
| 70 |
+
outdir: exp/train_nodev_rms_arctic_parallel_wavegan.v1
|
| 71 |
+
pin_memory: true
|
| 72 |
+
pretrain: ''
|
| 73 |
+
rank: 0
|
| 74 |
+
remove_short_samples: true
|
| 75 |
+
resume: /mnt/default/v-junyiao/vc_vocoder2/train_nodev_rms_arctic_parallel_wavegan.v1/checkpoint-110000steps.pkl
|
| 76 |
+
sampling_rate: 16000
|
| 77 |
+
save_interval_steps: 5000
|
| 78 |
+
stft_loss_params:
|
| 79 |
+
fft_sizes:
|
| 80 |
+
- 1024
|
| 81 |
+
- 2048
|
| 82 |
+
- 512
|
| 83 |
+
hop_sizes:
|
| 84 |
+
- 120
|
| 85 |
+
- 240
|
| 86 |
+
- 50
|
| 87 |
+
win_lengths:
|
| 88 |
+
- 600
|
| 89 |
+
- 1200
|
| 90 |
+
- 240
|
| 91 |
+
window: hann_window
|
| 92 |
+
train_dumpdir: dump/train_nodev_rms/norm
|
| 93 |
+
train_feats_scp: null
|
| 94 |
+
train_max_steps: 400000
|
| 95 |
+
train_segments: null
|
| 96 |
+
train_wav_scp: null
|
| 97 |
+
trim_frame_size: 2048
|
| 98 |
+
trim_hop_size: 512
|
| 99 |
+
trim_silence: false
|
| 100 |
+
trim_threshold_in_db: 60
|
| 101 |
+
verbose: 1
|
| 102 |
+
version: 0.4.8
|
| 103 |
+
win_length: null
|
| 104 |
+
window: hann
|
manifest/arctic_rms_parallel_wavegan.v1/pwg-arctic-rms-400000steps.pkl
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:d70ed1c03eada2e8616731292a885e9bbb8406f5859afee5003704725f23d876
|
| 3 |
+
size 5918653
|
manifest/arctic_rms_parallel_wavegan.v1/stats.npy
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:3332906cb47d19988579ddb6c513a7f5fd3bb4ba3b1704c1327e11726a47cac8
|
| 3 |
+
size 768
|
manifest/arctic_slt_parallel_wavegan.v1/.DS_Store
ADDED
|
Binary file (6.15 kB). View file
|
|
|
manifest/arctic_slt_parallel_wavegan.v1/config.yml
ADDED
|
@@ -0,0 +1,94 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
batch_max_steps: 15360
|
| 2 |
+
batch_size: 10
|
| 3 |
+
config: conf/parallel_wavegan.v1.yaml
|
| 4 |
+
dev_dumpdir: dump/dev/norm
|
| 5 |
+
discriminator_grad_norm: 1
|
| 6 |
+
discriminator_optimizer_params:
|
| 7 |
+
eps: 1.0e-06
|
| 8 |
+
lr: 5.0e-05
|
| 9 |
+
weight_decay: 0.0
|
| 10 |
+
discriminator_params:
|
| 11 |
+
bias: true
|
| 12 |
+
conv_channels: 64
|
| 13 |
+
in_channels: 1
|
| 14 |
+
kernel_size: 3
|
| 15 |
+
layers: 10
|
| 16 |
+
nonlinear_activation: LeakyReLU
|
| 17 |
+
nonlinear_activation_params:
|
| 18 |
+
negative_slope: 0.2
|
| 19 |
+
out_channels: 1
|
| 20 |
+
use_weight_norm: true
|
| 21 |
+
discriminator_scheduler_params:
|
| 22 |
+
gamma: 0.5
|
| 23 |
+
step_size: 200000
|
| 24 |
+
discriminator_train_start_steps: 100000
|
| 25 |
+
eval_interval_steps: 1000
|
| 26 |
+
fft_size: 1024
|
| 27 |
+
fmax: 7600
|
| 28 |
+
fmin: 80
|
| 29 |
+
format: npy
|
| 30 |
+
# hdf5
|
| 31 |
+
generator_grad_norm: 10
|
| 32 |
+
generator_optimizer_params:
|
| 33 |
+
eps: 1.0e-06
|
| 34 |
+
lr: 0.0001
|
| 35 |
+
weight_decay: 0.0
|
| 36 |
+
generator_params:
|
| 37 |
+
aux_channels: 80
|
| 38 |
+
aux_context_window: 2
|
| 39 |
+
dropout: 0.0
|
| 40 |
+
gate_channels: 128
|
| 41 |
+
in_channels: 1
|
| 42 |
+
kernel_size: 3
|
| 43 |
+
layers: 30
|
| 44 |
+
out_channels: 1
|
| 45 |
+
residual_channels: 64
|
| 46 |
+
skip_channels: 64
|
| 47 |
+
stacks: 3
|
| 48 |
+
upsample_net: ConvInUpsampleNetwork
|
| 49 |
+
upsample_params:
|
| 50 |
+
upsample_scales:
|
| 51 |
+
- 4
|
| 52 |
+
- 4
|
| 53 |
+
- 4
|
| 54 |
+
- 4
|
| 55 |
+
use_weight_norm: true
|
| 56 |
+
generator_scheduler_params:
|
| 57 |
+
gamma: 0.5
|
| 58 |
+
step_size: 200000
|
| 59 |
+
global_gain_scale: 1.0
|
| 60 |
+
hop_size: 256
|
| 61 |
+
lambda_adv: 4.0
|
| 62 |
+
log_interval_steps: 100
|
| 63 |
+
num_mels: 80
|
| 64 |
+
num_save_intermediate_results: 4
|
| 65 |
+
num_workers: 8
|
| 66 |
+
outdir: exp/train_nodev_arctic_slt_parallel_wavegan.v1
|
| 67 |
+
pin_memory: true
|
| 68 |
+
remove_short_samples: true
|
| 69 |
+
resume: exp/train_nodev_arctic_slt_parallel_wavegan.v1/checkpoint-300000steps.pkl
|
| 70 |
+
sampling_rate: 16000
|
| 71 |
+
save_interval_steps: 5000
|
| 72 |
+
stft_loss_params:
|
| 73 |
+
fft_sizes:
|
| 74 |
+
- 1024
|
| 75 |
+
- 2048
|
| 76 |
+
- 512
|
| 77 |
+
hop_sizes:
|
| 78 |
+
- 120
|
| 79 |
+
- 240
|
| 80 |
+
- 50
|
| 81 |
+
win_lengths:
|
| 82 |
+
- 600
|
| 83 |
+
- 1200
|
| 84 |
+
- 240
|
| 85 |
+
window: hann_window
|
| 86 |
+
train_dumpdir: dump/train_nodev/norm
|
| 87 |
+
train_max_steps: 400000
|
| 88 |
+
trim_frame_size: 2048
|
| 89 |
+
trim_hop_size: 512
|
| 90 |
+
trim_silence: false
|
| 91 |
+
trim_threshold_in_db: 60
|
| 92 |
+
verbose: 0
|
| 93 |
+
win_length: null
|
| 94 |
+
window: hann
|
manifest/arctic_slt_parallel_wavegan.v1/pwg-arctic-slt-400000steps.pkl
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:477686935b56f0eed684de9a31fb0f35600e4ce84b81e488c2b850fd07e630db
|
| 3 |
+
size 5918525
|
manifest/arctic_slt_parallel_wavegan.v1/stats.npy
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:8af46bfcde0d79c2d3936e25fbc7b59fb5043f064fb9fa53cd2323c8ea64abe1
|
| 3 |
+
size 768
|
manifest/dict.txt
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:036438c7cb5fc860b1d1066a3b111542515b1d4ac1f5a79a15a2322e8f79f402
|
| 3 |
+
size 309
|
manifest/test.tsv
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:9126dfb852be724b1d595ea69dc2adf96eaf2dd5ee2fe113a30229de3539491c
|
| 3 |
+
size 170418
|
manifest/train.tsv
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:067e049d317083e49ae22c7f5582a28253c1b24ba7988cb95b362eb1938e3553
|
| 3 |
+
size 1588164
|
manifest/utils/cmu_arctic_manifest.py
ADDED
|
@@ -0,0 +1,90 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
import argparse
|
| 2 |
+
import os
|
| 3 |
+
|
| 4 |
+
from torchaudio.datasets import CMUARCTIC
|
| 5 |
+
from tqdm import tqdm
|
| 6 |
+
|
| 7 |
+
|
| 8 |
+
SPLITS = {
|
| 9 |
+
"train": list(range( 0, 932)),
|
| 10 |
+
"valid": list(range( 932, 1032)),
|
| 11 |
+
"test": list(range(1032, 1132)),
|
| 12 |
+
}
|
| 13 |
+
|
| 14 |
+
|
| 15 |
+
def get_parser():
|
| 16 |
+
parser = argparse.ArgumentParser()
|
| 17 |
+
parser.add_argument(
|
| 18 |
+
"root", metavar="DIR", help="root directory containing wav files to index"
|
| 19 |
+
)
|
| 20 |
+
parser.add_argument(
|
| 21 |
+
"--dest", default=".", type=str, metavar="DIR", help="output directory"
|
| 22 |
+
)
|
| 23 |
+
parser.add_argument(
|
| 24 |
+
"--source", default="bdl,clb,slt,rms", type=str, help="Source voice from slt, clb, bdl, rms."
|
| 25 |
+
)
|
| 26 |
+
parser.add_argument(
|
| 27 |
+
"--target", default="bdl,clb,slt,rms", type=str, help="Target voice from slt, clb, bdl, rms."
|
| 28 |
+
)
|
| 29 |
+
parser.add_argument(
|
| 30 |
+
"--splits", default="932,100,100", type=str, help="Split of train,valid,test seperate by comma."
|
| 31 |
+
)
|
| 32 |
+
parser.add_argument(
|
| 33 |
+
"--wav-root", default=None, type=str, metavar="DIR", help="saved waveform root directory for tsv"
|
| 34 |
+
)
|
| 35 |
+
parser.add_argument(
|
| 36 |
+
"--spkemb-npy-dir", required=True, type=str, help="speaker embedding directory"
|
| 37 |
+
)
|
| 38 |
+
return parser
|
| 39 |
+
|
| 40 |
+
def main(args):
|
| 41 |
+
dest_dir = args.dest
|
| 42 |
+
wav_root = args.wav_root
|
| 43 |
+
if not os.path.exists(dest_dir):
|
| 44 |
+
os.makedirs(dest_dir)
|
| 45 |
+
|
| 46 |
+
source = args.source.split(",")
|
| 47 |
+
target = args.target.split(",")
|
| 48 |
+
spks = sorted(list(set(source + target)))
|
| 49 |
+
datasets = {}
|
| 50 |
+
|
| 51 |
+
datasets["slt"] = CMUARCTIC(args.root, url="slt", folder_in_archive="ARCTIC", download=False)
|
| 52 |
+
for spk in spks:
|
| 53 |
+
if spk != "slt":
|
| 54 |
+
datasets[spk] = CMUARCTIC(args.root, url=spk, folder_in_archive="ARCTIC", download=False)
|
| 55 |
+
datasets[spk]._walker = list(datasets["slt"]._walker) # some text sentences is missing
|
| 56 |
+
if "slt" not in spks:
|
| 57 |
+
del datasets["slt"]
|
| 58 |
+
|
| 59 |
+
num_splits = [int(n_split) for n_split in args.splits.split(',')]
|
| 60 |
+
assert sum(num_splits) == 1132, f"Missing utterances: {sum(num_splits)} != 1132"
|
| 61 |
+
|
| 62 |
+
tsv = {}
|
| 63 |
+
for split in SPLITS.keys():
|
| 64 |
+
tsv[split] = open(os.path.join(dest_dir, f"{split}.tsv"), "w")
|
| 65 |
+
print(wav_root, file=tsv[split])
|
| 66 |
+
|
| 67 |
+
for split, indices in SPLITS.items():
|
| 68 |
+
for i in tqdm(indices, desc=f"[{'-'.join(spks)}]tsv/wav/spk"):
|
| 69 |
+
for src_spk in source:
|
| 70 |
+
for tgt_spk in target:
|
| 71 |
+
if src_spk == tgt_spk: continue
|
| 72 |
+
# wav, sample_rate, utterance, utt_no
|
| 73 |
+
src_i = datasets[src_spk][i]
|
| 74 |
+
tgt_i = datasets[tgt_spk][i]
|
| 75 |
+
assert src_i[1] == tgt_i[1], f"{src_i[1]}-{tgt_i[1]}"
|
| 76 |
+
assert src_i[3] == tgt_i[3], f"{src_i[3]}-{tgt_i[3]}"
|
| 77 |
+
src_wav = os.path.join(os.path.basename(datasets[src_spk]._path), datasets[src_spk]._folder_audio, f"arctic_{src_i[3]}.wav")
|
| 78 |
+
src_nframes = src_i[0].shape[-1]
|
| 79 |
+
tgt_wav = os.path.join(os.path.basename(datasets[tgt_spk]._path), datasets[tgt_spk]._folder_audio, f"arctic_{tgt_i[3]}.wav")
|
| 80 |
+
tgt_nframes = tgt_i[0].shape[-1]
|
| 81 |
+
tgt_spkemb = os.path.join(args.spkemb_npy_dir, f"{os.path.basename(datasets[tgt_spk]._path)}-{datasets[tgt_spk]._folder_audio}-arctic_{tgt_i[3]}.npy")
|
| 82 |
+
print(f"{src_wav}\t{src_nframes}\t{tgt_wav}\t{tgt_nframes}\t{tgt_spkemb}", file=tsv[split])
|
| 83 |
+
for split in tsv.keys():
|
| 84 |
+
tsv[split].close()
|
| 85 |
+
|
| 86 |
+
|
| 87 |
+
if __name__ == "__main__":
|
| 88 |
+
parser = get_parser()
|
| 89 |
+
args = parser.parse_args()
|
| 90 |
+
main(args)
|
manifest/utils/make_tsv.sh
ADDED
|
@@ -0,0 +1,10 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
#!/bin/bash
|
| 2 |
+
# bash utils/make_tsv.sh /root/data/cmu_arctic/ /root/data/cmu_arctic/cmu_arctic_finetuning_meta /opt/tiger/ARCTIC
|
| 3 |
+
root=$1
|
| 4 |
+
dest=$2
|
| 5 |
+
wav_root=$3
|
| 6 |
+
spkemb_split=$4
|
| 7 |
+
if [ -z ${spkemb_split} ]; then
|
| 8 |
+
spkemb_split=spkrec-xvect
|
| 9 |
+
fi
|
| 10 |
+
python utils/cmu_arctic_manifest.py ${root} --dest ${dest} --wav-root ${wav_root} --spkemb-npy-dir ${spkemb_split}
|
manifest/utils/prep_cmu_arctic_spkemb.py
ADDED
|
@@ -0,0 +1,68 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
import os
|
| 2 |
+
import glob
|
| 3 |
+
import numpy
|
| 4 |
+
import argparse
|
| 5 |
+
import torchaudio
|
| 6 |
+
from speechbrain.pretrained import EncoderClassifier
|
| 7 |
+
import torch
|
| 8 |
+
from tqdm import tqdm
|
| 9 |
+
import torch.nn.functional as F
|
| 10 |
+
|
| 11 |
+
spk_model = {
|
| 12 |
+
"speechbrain/spkrec-xvect-voxceleb": 512,
|
| 13 |
+
"speechbrain/spkrec-ecapa-voxceleb": 192,
|
| 14 |
+
}
|
| 15 |
+
|
| 16 |
+
def f2embed(wav_file, classifier, size_embed):
|
| 17 |
+
signal, fs = torchaudio.load(wav_file)
|
| 18 |
+
assert fs == 16000, fs
|
| 19 |
+
with torch.no_grad():
|
| 20 |
+
embeddings = classifier.encode_batch(signal)
|
| 21 |
+
embeddings = F.normalize(embeddings, dim=2)
|
| 22 |
+
embeddings = embeddings.squeeze().cpu().numpy()
|
| 23 |
+
assert embeddings.shape[0] == size_embed, embeddings.shape[0]
|
| 24 |
+
return embeddings
|
| 25 |
+
|
| 26 |
+
def process(args):
|
| 27 |
+
wavlst = []
|
| 28 |
+
for split in args.splits.split(","):
|
| 29 |
+
wav_dir = os.path.join(args.arctic_root, split)
|
| 30 |
+
wavlst_split = glob.glob(os.path.join(wav_dir, "wav", "*.wav"))
|
| 31 |
+
print(f"{split} {len(wavlst_split)} utterances.")
|
| 32 |
+
wavlst.extend(wavlst_split)
|
| 33 |
+
|
| 34 |
+
spkemb_root = args.output_root
|
| 35 |
+
if not os.path.exists(spkemb_root):
|
| 36 |
+
print(f"Create speaker embedding directory: {spkemb_root}")
|
| 37 |
+
os.mkdir(spkemb_root)
|
| 38 |
+
device = "cuda" if torch.cuda.is_available() else "cpu"
|
| 39 |
+
classifier = EncoderClassifier.from_hparams(source=args.speaker_embed, run_opts={"device": device}, savedir=os.path.join('/tmp', args.speaker_embed))
|
| 40 |
+
size_embed = spk_model[args.speaker_embed]
|
| 41 |
+
for utt_i in tqdm(wavlst, total=len(wavlst), desc="Extract"):
|
| 42 |
+
# TODO rename speaker embedding
|
| 43 |
+
utt_id = "-".join(utt_i.split("/")[-3:]).replace(".wav", "")
|
| 44 |
+
utt_emb = f2embed(utt_i, classifier, size_embed)
|
| 45 |
+
numpy.save(os.path.join(spkemb_root, f"{utt_id}.npy"), utt_emb)
|
| 46 |
+
|
| 47 |
+
def main():
|
| 48 |
+
parser = argparse.ArgumentParser()
|
| 49 |
+
parser.add_argument("--arctic-root", "-i", required=True, type=str, help="LibriTTS root directory.")
|
| 50 |
+
parser.add_argument("--output-root", "-o", required=True, type=str, help="Output directory.")
|
| 51 |
+
parser.add_argument("--speaker-embed", "-s", type=str, required=True, choices=["speechbrain/spkrec-xvect-voxceleb", "speechbrain/spkrec-ecapa-voxceleb"],
|
| 52 |
+
help="Pretrained model for extracting speaker emebdding.")
|
| 53 |
+
parser.add_argument("--splits", type=str, help="Split of four speakers seperate by comma.",
|
| 54 |
+
default="cmu_us_bdl_arctic,cmu_us_clb_arctic,cmu_us_rms_arctic,cmu_us_slt_arctic")
|
| 55 |
+
args = parser.parse_args()
|
| 56 |
+
print(f"Loading utterances from {args.arctic_root}/{args.splits}, "
|
| 57 |
+
+ f"Save speaker embedding 'npy' to {args.output_root}, "
|
| 58 |
+
+ f"Using speaker model {args.speaker_embed} with {spk_model[args.speaker_embed]} size.")
|
| 59 |
+
process(args)
|
| 60 |
+
|
| 61 |
+
if __name__ == "__main__":
|
| 62 |
+
"""
|
| 63 |
+
python utils/prep_cmu_arctic_spkemb.py \
|
| 64 |
+
-i /root/data/cmu_arctic/CMUARCTIC \
|
| 65 |
+
-o /root/data/cmu_arctic/CMUARCTIC/spkrec-xvect \
|
| 66 |
+
-s speechbrain/spkrec-xvect-voxceleb
|
| 67 |
+
"""
|
| 68 |
+
main()
|
manifest/utils/spec2wav.sh
ADDED
|
File without changes
|
manifest/valid.tsv
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:a0d3fc2569593894864f881f2027c46b9ea39fcb01f0e6cdbacc8213dfa8dd6f
|
| 3 |
+
size 170418
|