RemFx

Running

App Files Files Community

mattricesound commited on Jul 26, 2023

Commit

c1b80c0

1 Parent(s): af0842b

Update CSV logger

Browse files

Files changed (10) hide show

README.md +22 -12
cfg/config.yaml +4 -1
cfg/exp/chain_inference_aug_classifier.yaml +0 -1
remfx/callbacks.py +1 -3
remfx/datasets.py +9 -3
remfx/models.py +0 -16
remfx/utils.py +4 -36
scripts/download.py +43 -34
scripts/generate_dataset.py +15 -0
setup.py +1 -5

README.md CHANGED Viewed

@@ -1,7 +1,7 @@
 # General Purpose Audio Effect Removal
 Removing multiple audio effects from multiple sources using compositional audio effect removal and source separation and speech enhancement models.
-This repo contains the code for the paper [General Purpose Audio Effect Removal](https://arxiv.org/abs/2110.00484). (Todo: Link broken, Add video, Add img)
@@ -9,7 +9,7 @@ This repo contains the code for the paper [General Purpose Audio Effect Removal]
 ```
 git clone https://github.com/mhrice/RemFx.git
 git submodule update --init --recursive
-pip install . umx
 ```
 # Usage
 This repo can be used for many different tasks. Here are some examples.
@@ -24,11 +24,11 @@ wget https://zenodo.org/record/8183649/files/RemFX_eval_dataset.zip?download=1 -
 unzip RemFX_eval_dataset.zip
 ```
-## Download the datasets used in the paper
 ```
 python scripts/download.py vocalset guitarset idmt-smt-bass idmt-smt-drums
 ```
-By default, the datasets are downloaded to `./data/remfx-data`. To change this, pass `--output_dir={path/to/datasets}` to `download.py`
 Then set the dataset root :
 ```
@@ -36,7 +36,7 @@ export DATASET_ROOT={path/to/datasets}
 ```
 ## Training
-Before training, it is important that you have downloaded the datasets (see above) and set DATASET_ROOT.
 This project uses the [pytorch-lightning](https://www.pytorchlightning.ai/index.html) framework and [hydra](https://hydra.cc/) for configuration management. All experiments are defined in `cfg/exp/`. To train with an existing experiment run
 ```
 python scripts/train.py +exp={experiment_name}
@@ -55,13 +55,17 @@ Here are some selected experiment types from the paper, which use different data
 To change the configuration, simply edit the experiment file, or override the configuration on the command line. A description of some of these variables is in the Misc. section below.
 You can also create a custom experiment by creating a new experiment file in `cfg/exp/` and overriding the default parameters in `config.yaml`.
-At the end of training, the train script will automatically evaluate the test set using the best checkpoint (by validation loss). To evaluate a specific checkpoint, run
 ```
 python test.py +exp={experiment_name} ckpt_path={path/to/checkpoint}
 ```
-If you have generated the dataset separately from training, be sure to set `render_files=False` in the config or command-line, and set `render_root={path_to_dataset}` if it is in a custom location.
 Also note that the training assumes you have a GPU. To train on CPU, set `accelerator=null` in the config or command-line.
@@ -86,16 +90,21 @@ Download checkpoints from [here](https://zenodo.org/record/8179396), or see the
 ## Generate datasets used in the paper
-Before generating datasets, it is important that you have downloaded the datasets (see above) and set DATASET_ROOT.
-To generate one of the datasets used in the paper, it is as simple as running a training job with a particular config. For example, to generate the `chorus` FXAug dataset, which includes files with 5 possible effects, up to 4 kept effects (distortion, reverb, compression, delay), and 1 removed effects (chorus), run
 ```
-python scripts/train.py +exp=chorus_aug
 ```
 See the Misc. section below for a description of the parameters.
 By default, files are rendered to `{render_root} / processed / {string_of_effects} / {train|val|test}`.
 ## Evaluate with a custom directory
 Assumes directory is structured as
 - root
@@ -120,15 +129,16 @@ python scripts/chain_inference.py +exp=chain_inference_custom
 # Misc.
 ## Experimental parameters
-Some relevant training parameters descriptions
 - `num_kept_effects={[min, max]}` range of <b> Kept </b> effects to apply to each file. Inclusive.
 - `num_removed_effects={[min, max]}` range of <b> Removed </b> effects to apply to each file. Inclusive.
 - `model={model}` architecture to use (see 'Effect Removal Models/Effect Classification Models')
-- `effects_to_keep={[effect]}` Effects to apply but not remove (see 'Effects')
 - `effects_to_remove={[effect]}` Effects to remove (see 'Effects')
 - `accelerator=null/'gpu'` Use GPU (1 device) (default: null)
 - `render_files=True/False` Render files. Disable to skip rendering stage (default: True)
 - `render_root={path/to/dir}`. Root directory to render files to (default: ./data)
 ### Effect Removal Models
 - `umx`

 # General Purpose Audio Effect Removal
 Removing multiple audio effects from multiple sources using compositional audio effect removal and source separation and speech enhancement models.
+This repo contains the code for the paper [General Purpose Audio Effect Removal](https://arxiv.org/abs/2110.00484). (Todo: Link broken, Add video, Add img, citation)
 ```
 git clone https://github.com/mhrice/RemFx.git
 git submodule update --init --recursive
+pip install -e . ./umx
 ```
 # Usage
 This repo can be used for many different tasks. Here are some examples.
 unzip RemFX_eval_dataset.zip
 ```
+## Download the starter datasets
 ```
 python scripts/download.py vocalset guitarset idmt-smt-bass idmt-smt-drums
 ```
+By default, the starter datasets are downloaded to `./data/remfx-data`. To change this, pass `--output_dir={path/to/datasets}` to `download.py`
 Then set the dataset root :
 ```
 ```
 ## Training
+Before training, it is important that you have downloaded the starter datasets (see above) and set DATASET_ROOT.
 This project uses the [pytorch-lightning](https://www.pytorchlightning.ai/index.html) framework and [hydra](https://hydra.cc/) for configuration management. All experiments are defined in `cfg/exp/`. To train with an existing experiment run
 ```
 python scripts/train.py +exp={experiment_name}
 To change the configuration, simply edit the experiment file, or override the configuration on the command line. A description of some of these variables is in the Misc. section below.
 You can also create a custom experiment by creating a new experiment file in `cfg/exp/` and overriding the default parameters in `config.yaml`.
+At the end of training, the train script will automatically evaluate the test set using the best checkpoint (by validation loss). If epoch 0 is not finished, it will throw an error. To evaluate a specific checkpoint, run
 ```
 python test.py +exp={experiment_name} ckpt_path={path/to/checkpoint}
 ```
+The checkpoints will be saved in `./logs/ckpts/{timestamp}`
+Metrics and hyperparams will be logged in `./lightning_logs/{timestamp}`
+By default, the dataset needed for the experiment is generated before training.
+If you have generated the dataset separately (see Generate datasets used in the paper), be sure to set `render_files=False` in the config or command-line, and set `render_root={path_to_dataset}` if it is in a custom location.
 Also note that the training assumes you have a GPU. To train on CPU, set `accelerator=null` in the config or command-line.
 ## Generate datasets used in the paper
+The datasets used in the experiments are customly generated from the starter datasets. In short, for each training/val/testing example, we select a random 5.5s segment from one of the starter datasets and apply a random number of effects to it. The number of effects applied is controlled by the `num_kept_effects` and `num_removed_effects` parameters. The effects applied are controlled by the `effects_to_keep` and `effects_to_remove` parameters.
+Before generating datasets, it is important that you have downloaded the starter datasets (see above) and set DATASET_ROOT.
+To generate one of the datasets used in the paper, use of the experiments defined in `cfg/exp/`.
+For example, to generate the `chorus` FXAug dataset, which includes files with 5 possible effects, up to 4 kept effects (distortion, reverb, compression, delay), and 1 removed effects (chorus), run
 ```
+python scripts/generate_dataset.py +exp=chorus_aug
 ```
 See the Misc. section below for a description of the parameters.
 By default, files are rendered to `{render_root} / processed / {string_of_effects} / {train|val|test}`.
+If training, this process will be done automatically at the start of training. To disable this, set `render_files=False` in the config or command-line, and set `render_root={path_to_dataset}` if it is in a custom location.
 ## Evaluate with a custom directory
 Assumes directory is structured as
 - root
 # Misc.
 ## Experimental parameters
+Some relevant dataset/training parameters descriptions
 - `num_kept_effects={[min, max]}` range of <b> Kept </b> effects to apply to each file. Inclusive.
 - `num_removed_effects={[min, max]}` range of <b> Removed </b> effects to apply to each file. Inclusive.
 - `model={model}` architecture to use (see 'Effect Removal Models/Effect Classification Models')
+- `effects_to_keep={[effect]}` Effects to apply but not remove (see 'Effects'). Used for FXAug.
 - `effects_to_remove={[effect]}` Effects to remove (see 'Effects')
 - `accelerator=null/'gpu'` Use GPU (1 device) (default: null)
 - `render_files=True/False` Render files. Disable to skip rendering stage (default: True)
 - `render_root={path/to/dir}`. Root directory to render files to (default: ./data)
+- `datamodule.train_batch_size={batch_size}`. Change batch size (default: varies)
 ### Effect Removal Models
 - `umx`

cfg/config.yaml CHANGED Viewed

@@ -63,7 +63,7 @@ datamodule:
     shuffle_removed_effects: ${shuffle_removed_effects}
     render_files: ${render_files}
     render_root: ${render_root}
-    parallel: True
   val_dataset:
     _target_: remfx.datasets.EffectDataset
     total_chunks: 1000
@@ -80,6 +80,7 @@ datamodule:
     shuffle_removed_effects: ${shuffle_removed_effects}
     render_files: ${render_files}
     render_root: ${render_root}
   test_dataset:
     _target_: remfx.datasets.EffectDataset
     total_chunks: 1000
@@ -96,6 +97,7 @@ datamodule:
     shuffle_removed_effects: ${shuffle_removed_effects}
     render_files: ${render_files}
     render_root: ${render_root}
   train_batch_size: 16
   test_batch_size: 1
@@ -115,6 +117,7 @@ datamodule:
 logger:
   _target_: pytorch_lightning.loggers.CSVLogger
   save_dir: "."
 trainer:
   _target_: pytorch_lightning.Trainer

     shuffle_removed_effects: ${shuffle_removed_effects}
     render_files: ${render_files}
     render_root: ${render_root}
+    parallel: False
   val_dataset:
     _target_: remfx.datasets.EffectDataset
     total_chunks: 1000
     shuffle_removed_effects: ${shuffle_removed_effects}
     render_files: ${render_files}
     render_root: ${render_root}
+    parallel: False
   test_dataset:
     _target_: remfx.datasets.EffectDataset
     total_chunks: 1000
     shuffle_removed_effects: ${shuffle_removed_effects}
     render_files: ${render_files}
     render_root: ${render_root}
+    parallel: False
   train_batch_size: 16
   test_batch_size: 1
 logger:
   _target_: pytorch_lightning.loggers.CSVLogger
   save_dir: "."
+  version: ${now:%Y-%m-%d-%H-%M-%S}
 trainer:
   _target_: pytorch_lightning.Trainer

cfg/exp/chain_inference_aug_classifier.yaml CHANGED Viewed

@@ -76,7 +76,6 @@ ckpts:
   RandomPedalboardDelay:
     model: ${dcunet}
     ckpt_path: "ckpts/dcunet_delay_aug.ckpt"
 inference_effects_ordering:
   - "RandomPedalboardDistortion"
   - "RandomPedalboardCompressor"

   RandomPedalboardDelay:
     model: ${dcunet}
     ckpt_path: "ckpts/dcunet_delay_aug.ckpt"
 inference_effects_ordering:
   - "RandomPedalboardDistortion"
   - "RandomPedalboardCompressor"

remfx/callbacks.py CHANGED Viewed

@@ -42,9 +42,7 @@ class AudioCallback(Callback):
             )
             self.log_train_audio = False
-    def on_validation_batch_start(
-        self, trainer, pl_module, batch, batch_idx, dataloader_idx
-    ):
         x, target, _, rem_fx_labels = batch
         # Only run on first batch
         if batch_idx == 0 and self.log_audio:

             )
             self.log_train_audio = False
+    def on_validation_batch_start(self, trainer, pl_module, batch, batch_idx):
         x, target, _, rem_fx_labels = batch
         # Only run on first batch
         if batch_idx == 0 and self.log_audio:

remfx/datasets.py CHANGED Viewed

@@ -83,7 +83,7 @@ def locate_files(root: str, mode: str):
         print(f"Found {len(files)} files in GuitarSet {mode}.")
         file_list.append(sorted(files))
     # ------------------------- DSD100 ---------------------------------
-    dsd_100_dir = os.path.join(root, "DSD100")
     if os.path.isdir(dsd_100_dir):
         files = glob.glob(
             os.path.join(dsd_100_dir, mode, "**", "*.wav"),
@@ -427,7 +427,13 @@ class EffectDataset(Dataset):
                     chunk = None
                     random_dataset_choice = random.choice(self.files)
                     while chunk is None:
-                        random_file_choice = random.choice(random_dataset_choice)
                         chunk = select_random_chunk(
                             random_file_choice, self.chunk_size, self.sample_rate
                         )
@@ -572,7 +578,7 @@ class EffectDataset(Dataset):
             normalized_wet = self.normalize(wet)
             # Check STFT, pick different effects if necessary
-            stft = self.mrstft(normalized_wet, normalized_dry)
         return normalized_dry, normalized_wet, dry_labels_tensor, wet_labels_tensor

         print(f"Found {len(files)} files in GuitarSet {mode}.")
         file_list.append(sorted(files))
     # ------------------------- DSD100 ---------------------------------
+    dsd_100_dir = os.path.join(root, "DSD100/DSD100")
     if os.path.isdir(dsd_100_dir):
         files = glob.glob(
             os.path.join(dsd_100_dir, mode, "**", "*.wav"),
                     chunk = None
                     random_dataset_choice = random.choice(self.files)
                     while chunk is None:
+                        try:
+                            random_file_choice = random.choice(random_dataset_choice)
+                        except IndexError:
+                            print("IndexError")
+                            print(random_dataset_choice)
+                            print(random_file_choice)
+                            raise IndexError
                         chunk = select_random_chunk(
                             random_file_choice, self.chunk_size, self.sample_rate
                         )
             normalized_wet = self.normalize(wet)
             # Check STFT, pick different effects if necessary
+            stft = self.mrstft(normalized_wet.unsqueeze(0), normalized_dry.unsqueeze(0))
         return normalized_dry, normalized_wet, dry_labels_tensor, wet_labels_tensor

remfx/models.py CHANGED Viewed

@@ -4,7 +4,6 @@ import torchmetrics
 import pytorch_lightning as pl
 from torch import Tensor, nn
 from torchaudio.models import HDemucs
-from audio_diffusion_pytorch import DiffusionModel
 from auraloss.time import SISDRLoss
 from auraloss.freq import MultiResolutionSTFTLoss
 from umx.openunmix.model import OpenUnmix, Separator
@@ -343,21 +342,6 @@ class DemucsModel(nn.Module):
         return self.model(x).squeeze(1)
-class DiffusionGenerationModel(nn.Module):
-    def __init__(self, n_channels: int = 1):
-        super().__init__()
-        self.model = DiffusionModel(in_channels=n_channels)
-    def forward(self, batch):
-        x, target = batch
-        sampled_out = self.model.sample(x)
-        return self.model(x), sampled_out
-    def sample(self, x: Tensor, num_steps: int = 10) -> Tensor:
-        noise = torch.randn(x.shape).to(x)
-        return self.model.sample(noise, num_steps=num_steps)
 class DPTNetModel(nn.Module):
     def __init__(self, sample_rate, num_bins, **kwargs):
         super().__init__()

 import pytorch_lightning as pl
 from torch import Tensor, nn
 from torchaudio.models import HDemucs
 from auraloss.time import SISDRLoss
 from auraloss.freq import MultiResolutionSTFTLoss
 from umx.openunmix.model import OpenUnmix, Separator
         return self.model(x).squeeze(1)
 class DPTNetModel(nn.Module):
     def __init__(self, sample_rate, num_bins, **kwargs):
         super().__init__()

remfx/utils.py CHANGED Viewed

@@ -3,7 +3,6 @@ from typing import List, Tuple
 import pytorch_lightning as pl
 from omegaconf import DictConfig
 from pytorch_lightning.utilities import rank_zero_only
-from frechet_audio_distance import FrechetAudioDistance
 import numpy as np
 import torch
 import torchaudio
@@ -52,9 +51,6 @@ def log_hyperparameters(
     if not trainer.logger:
         return
-    if type(trainer.logger) == pl.loggers.CSVLogger:
-        return
     hparams = {}
     # choose which parts of hydra config will be saved to loggers
@@ -77,38 +73,10 @@ def log_hyperparameters(
     if "callbacks" in config:
         hparams["callbacks"] = config["callbacks"]
-    logger.experiment.config.update(hparams)
-class FADLoss(torch.nn.Module):
-    def __init__(self, sample_rate: float):
-        super().__init__()
-        self.fad = FrechetAudioDistance(
-            use_pca=False, use_activation=False, verbose=False
-        )
-        self.fad.model = self.fad.model.to("cpu")
-        self.sr = sample_rate
-    def forward(self, audio_background, audio_eval):
-        embds_background = []
-        embds_eval = []
-        for sample in audio_background:
-            embd = self.fad.model.forward(sample.T.cpu().detach().numpy(), self.sr)
-            embds_background.append(embd.cpu().detach().numpy())
-        for sample in audio_eval:
-            embd = self.fad.model.forward(sample.T.cpu().detach().numpy(), self.sr)
-            embds_eval.append(embd.cpu().detach().numpy())
-        embds_background = np.concatenate(embds_background, axis=0)
-        embds_eval = np.concatenate(embds_eval, axis=0)
-        mu_background, sigma_background = self.fad.calculate_embd_statistics(
-            embds_background
-        )
-        mu_eval, sigma_eval = self.fad.calculate_embd_statistics(embds_eval)
-        fad_score = self.fad.calculate_frechet_distance(
-            mu_background, sigma_background, mu_eval, sigma_eval
-        )
-        return fad_score
 def create_random_chunks(

 import pytorch_lightning as pl
 from omegaconf import DictConfig
 from pytorch_lightning.utilities import rank_zero_only
 import numpy as np
 import torch
 import torchaudio
     if not trainer.logger:
         return
     hparams = {}
     # choose which parts of hydra config will be saved to loggers
     if "callbacks" in config:
         hparams["callbacks"] = config["callbacks"]
+    if type(trainer.logger) == pl.loggers.CSVLogger:
+        logger.log_hyperparams(hparams)
+    else:
+        logger.experiment.config.update(hparams)
 def create_random_chunks(

scripts/download.py CHANGED Viewed

@@ -6,54 +6,62 @@ import shutil
 def download_zip_dataset(dataset_url: str, output_dir: str):
     zip_filename = os.path.basename(dataset_url)
     zip_name = zip_filename.replace(".zip", "")
-    os.system(f"wget -P {output_dir} {dataset_url}")
-    os.system(
-        f"""unzip {os.path.join(output_dir, zip_filename)} -d {os.path.join(output_dir, zip_name)}"""
-    )
-    os.system(f"rm {os.path.join(output_dir, zip_filename)}")
 def process_dataset(dataset_dir: str, output_dir: str):
-    if dataset_dir == "VocalSet1-2":
-        pass
-    elif dataset_dir == "audio_mono-mic":
         pass
-    elif dataset_dir == "IDMT-SMT-BASS":
         pass
-    elif dataset_dir == "IDMT-SMT-DRUMS-V2":
         pass
-    elif dataset_dir == "DSD100":
-        shutil.rmtree(os.path.join(output_dir, dataset_dir, "Mixtures"))
-        for dir in os.listdir(os.path.join(output_dir, dataset_dir, "Sources", "Dev")):
-            source = os.path.join(output_dir, dataset_dir, "Sources", "Dev", dir)
-            shutil.move(source, os.path.join(output_dir, dataset_dir))
-        shutil.rmtree(os.path.join(output_dir, dataset_dir, "Sources", "Dev"))
-        for dir in os.listdir(os.path.join(output_dir, dataset_dir, "Sources", "Test")):
-            source = os.path.join(output_dir, dataset_dir, "Sources", "Test", dir)
-            shutil.move(source, os.path.join(output_dir, dataset_dir))
-        shutil.rmtree(os.path.join(output_dir, dataset_dir, "Sources", "Test"))
-        shutil.rmtree(os.path.join(output_dir, dataset_dir, "Sources"))
-        os.mkdir(os.path.join(output_dir, dataset_dir, "train"))
-        os.mkdir(os.path.join(output_dir, dataset_dir, "val"))
-        os.mkdir(os.path.join(output_dir, dataset_dir, "test"))
-        files = os.listdir(os.path.join(output_dir, dataset_dir))
         num = 0
         for dir in files:
-            if not os.path.isdir(os.path.join(output_dir, dataset_dir, dir)):
                 continue
             if dir == "train" or dir == "val" or dir == "test":
                 continue
-            source = os.path.join(output_dir, dataset_dir, dir, "bass.wav")
             if num < 80:
-                dest = os.path.join(output_dir, dataset_dir, "train", f"{num}.wav")
             elif num < 90:
-                dest = os.path.join(output_dir, dataset_dir, "val", f"{num}.wav")
             else:
-                dest = os.path.join(output_dir, dataset_dir, "test", f"{num}.wav")
             shutil.move(source, dest)
-            shutil.rmtree(os.path.join(output_dir, dataset_dir, dir))
             num += 1
     else:
@@ -81,11 +89,12 @@ if __name__ == "__main__":
     dataset_urls = {
         "vocalset": "https://zenodo.org/record/1442513/files/VocalSet1-2.zip",
         "guitarset": "https://zenodo.org/record/3371780/files/audio_mono-mic.zip",
-        "DSD100": "http://liutkus.net/DSD100.zip",
-        "IDMT-SMT-DRUMS-V2": "https://zenodo.org/record/7544164/files/IDMT-SMT-DRUMS-V2.zip",
     }
     for dataset_name, dataset_url in dataset_urls.items():
         if dataset_name in args.dataset_names:
             download_zip_dataset(dataset_url, args.output_dir)
-            process_dataset(dataset_name, args.ou)

 def download_zip_dataset(dataset_url: str, output_dir: str):
     zip_filename = os.path.basename(dataset_url)
     zip_name = zip_filename.replace(".zip", "")
+    if not os.path.exists(os.path.join(output_dir, zip_name)):
+        os.system(f"wget -P {output_dir} {dataset_url}")
+        os.system(
+            f"""unzip {os.path.join(output_dir, zip_filename)} -d {os.path.join(output_dir, zip_name)}"""
+        )
+        os.system(f"rm {os.path.join(output_dir, zip_filename)}")
+    else:
+        print(
+            f"Dataset {zip_name} already downloaded at {output_dir}, skipping download."
+        )
 def process_dataset(dataset_dir: str, output_dir: str):
+    if dataset_dir == "vocalset":
         pass
+    elif dataset_dir == "guitarset":
         pass
+    elif dataset_dir == "idmt-smt-drums":
         pass
+    elif dataset_dir == "dsd100":
+        dataset_root_dir = "DSD100/DSD100"
+        shutil.rmtree(os.path.join(output_dir, dataset_root_dir, "Mixtures"))
+        for dir in os.listdir(
+            os.path.join(output_dir, dataset_root_dir, "Sources", "Dev")
+        ):
+            source = os.path.join(output_dir, dataset_root_dir, "Sources", "Dev", dir)
+            shutil.move(source, os.path.join(output_dir, dataset_root_dir))
+        shutil.rmtree(os.path.join(output_dir, dataset_root_dir, "Sources", "Dev"))
+        for dir in os.listdir(
+            os.path.join(output_dir, dataset_root_dir, "Sources", "Test")
+        ):
+            source = os.path.join(output_dir, dataset_root_dir, "Sources", "Test", dir)
+            shutil.move(source, os.path.join(output_dir, dataset_root_dir))
+        shutil.rmtree(os.path.join(output_dir, dataset_root_dir, "Sources", "Test"))
+        shutil.rmtree(os.path.join(output_dir, dataset_root_dir, "Sources"))
+        os.mkdir(os.path.join(output_dir, dataset_root_dir, "train"))
+        os.mkdir(os.path.join(output_dir, dataset_root_dir, "val"))
+        os.mkdir(os.path.join(output_dir, dataset_root_dir, "test"))
+        files = os.listdir(os.path.join(output_dir, dataset_root_dir))
         num = 0
         for dir in files:
+            if not os.path.isdir(os.path.join(output_dir, dataset_root_dir, dir)):
                 continue
             if dir == "train" or dir == "val" or dir == "test":
                 continue
+            source = os.path.join(output_dir, dataset_root_dir, dir, "bass.wav")
             if num < 80:
+                dest = os.path.join(output_dir, dataset_root_dir, "train", f"{num}.wav")
             elif num < 90:
+                dest = os.path.join(output_dir, dataset_root_dir, "val", f"{num}.wav")
             else:
+                dest = os.path.join(output_dir, dataset_root_dir, "test", f"{num}.wav")
             shutil.move(source, dest)
+            shutil.rmtree(os.path.join(output_dir, dataset_root_dir, dir))
             num += 1
     else:
     dataset_urls = {
         "vocalset": "https://zenodo.org/record/1442513/files/VocalSet1-2.zip",
         "guitarset": "https://zenodo.org/record/3371780/files/audio_mono-mic.zip",
+        "dsd100": "http://liutkus.net/DSD100.zip",
+        "idmt-smt-drums": "https://zenodo.org/record/7544164/files/IDMT-SMT-DRUMS-V2.zip",
     }
     for dataset_name, dataset_url in dataset_urls.items():
         if dataset_name in args.dataset_names:
+            print("Downloading dataset: ", dataset_name)
             download_zip_dataset(dataset_url, args.output_dir)
+            process_dataset(dataset_name, args.output_dir)

scripts/generate_dataset.py ADDED Viewed

	@@ -0,0 +1,15 @@

+import pytorch_lightning as pl
+import hydra
+from omegaconf import DictConfig
+@hydra.main(version_base=None, config_path="../cfg", config_name="config.yaml")
+def main(cfg: DictConfig):
+    # Apply seed for reproducibility
+    if cfg.seed:
+        pl.seed_everything(cfg.seed)
+    datamodule = hydra.utils.instantiate(cfg.datamodule, _convert_="partial")
+if __name__ == "__main__":
+    main()

setup.py CHANGED Viewed

@@ -35,18 +35,14 @@ setup(
         "scipy",
         "numpy",
         "torchvision",
-        "pytorch-lightning",
         "numba",
         "wandb",
-        "audio-diffusion-pytorch",
-        "ema_pytorch",
         "einops",
-        "librosa",
         "hydra-core",
         "auraloss",
         "pyloudnorm",
         "pedalboard",
-        "frechet_audio_distance",
         "asteroid",
     ],
     include_package_data=True,

         "scipy",
         "numpy",
         "torchvision",
+        "pytorch-lightning>=2.0.0",
         "numba",
         "wandb",
         "einops",
         "hydra-core",
         "auraloss",
         "pyloudnorm",
         "pedalboard",
         "asteroid",
     ],
     include_package_data=True,