Data Preparation
Data Format
Currently, our dataloader is able to load data from
- a directory that is full of images (support using
turbojpeg
to speed up decoding images.) - a
lmdb
file - an image list
- a compressed file (i.e.,
zip
package)
by modifying data_format
in the configuration.
NOTE: For some computing clusters whose I/O speed may be slow, we recommend the zip
format for two reasons. First, zip
file is easy to create. Second, this can load a large file at one time instead of loading small files repeatedly.
Data Sampling
Considering that most generative models are trained in the unit of iterations instead of epochs, we change the default data loader to an iter-based one. Besides, the original distributed data sampler is also modified to make the shuffling correspond to iteration instead of epoch.
NOTE: In order to reduce the data re-loading cost between epochs, we manually extend the length of sampled indices to make it much more efficient.
Data Augmentation
To better align with the original implementation of PGGAN and StyleGAN (i.e., models that require progressive training), we support progressive resize in transforms.py
, which downsamples images with the maximum resize factor of 2 at each time.