File size: 1,206 Bytes
6fc683c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
# Data Preparation

## InstructPix2Pix
```shell
bash scripts/download_data.sh path/to/clip-filtered-dataset
python convert_instructp2p.py --data-dir /path/to/clip-filtered-dataset/ --output-dir /path/to/output-dir/ --num-process 64
```

## OpenImage
```shell
wget https://storage.googleapis.com/openimages/2018_04/image_ids_and_rotation.csv
python convert_openimage.py --data-dir /path/to/image_ids_and_rotation.csv --output-dir /path/to/output-dir/ --num-process 8 --cuda_device [0, 1, 2, 3, 4, 5, 6, 7]
```

if you want to preprocess the data in multiple nodes, you need to specify the `--num-machine` and `--machine-id` arguments. For example, if you want to preprocess the data in 8 nodes, you can run the following command in node 0:
```shell
python convert_openimage.py --data-dir /path/to/image_ids_and_rotation.csv --output-dir /path/to/output-dir/ --num-process 8 --cuda_device [0, 1, 2, 3, 4, 5, 6, 7] --num-machine 8 --machine-id 0
```
and run the following command in node 1:
```shell
python convert_openimage.py --data-dir /path/to/image_ids_and_rotation.csv --output-dir /path/to/output-dir/ --num-process 8 --cuda_device [0, 1, 2, 3, 4, 5, 6, 7] --num-machine 8 --machine-id 1
```
and so on.