Finetuning on a single speaker
Hey, i am getting the error DacModel.encode() got an unexpected keyword argument 'bandwidth while trying to finetune the model on a single speaker.
i use the following configs:
!accelerate launch ./training/run_parler_tts_training.py
--model_name_or_path "HelpingAI/HelpingAI-TTS-v1"
--feature_extractor_name "ylacombe/dac_44khz"
--description_tokenizer_name "google/flan-t5-large"
--prompt_tokenizer_name "google/flan-t5-large"
--report_to "tensorboard"
--overwrite_output_dir true
--train_dataset_name "man-ml/my_audio_syn"
--train_metadata_dataset_name "man-ml/my_audio_syn-Emma"
--train_dataset_config_name "default"
--train_split_name "train"
--eval_dataset_name "man-ml/my_audio_syn"
--eval_metadata_dataset_name "man-ml/my_audio_syn-Emma"
--eval_dataset_config_name "default"
--eval_split_name "train"
--max_eval_samples 8
--per_device_eval_batch_size 8
--target_audio_column_name "audio"
--description_column_name "text_description"
--prompt_column_name "text"
--max_duration_in_seconds 20
--min_duration_in_seconds 2.0
--max_text_length 400
--preprocessing_num_workers 2
--do_train true
--num_train_epochs 2
--gradient_accumulation_steps 18
--gradient_checkpointing true
--per_device_train_batch_size 2
--learning_rate 0.0001
--adam_beta1 0.9
--adam_beta2 0.99
--weight_decay 0.01
--lr_scheduler_type "constant_with_warmup"
--warmup_steps 50
--logging_steps 2
--freeze_text_encoder true
--audio_encoder_per_device_batch_size 5
--dtype "float16"
--seed 456
--output_dir "./output_dir_training/"
--temporary_save_to_disk "./audio_code_tmp/"
--save_to_disk "./tmp_dataset_audio/"
--dataloader_num_workers 2
--do_eval
--predict_with_generate
--include_inputs_for_metrics
--group_by_length true
Any help would be appreciated