LKCell / docs /readmes /pannuke.md
qingke1's picture
initial commit
aea73e2
|
raw
history blame
2.01 kB
## PanNuke Preparation
The original PanNuke dataset has the following style using just one big array for each dataset split:
```bash
β”œβ”€β”€ fold0
β”‚ β”œβ”€β”€ images.npy
β”‚ β”œβ”€β”€ masks.npy
β”‚ └── types.npy
β”œβ”€β”€ fold1
β”‚ β”œβ”€β”€ images.npy
β”‚ β”œβ”€β”€ masks.npy
β”‚ └── types.npy
└── fold2
β”œβ”€β”€ images.npy
β”œβ”€β”€ masks.npy
└── types.npy
```
For memory efficieny and to make us of multi-threading dataloading with our augmentation pipeline, we reassemble the dataset to the following structure:
```bash
β”œβ”€β”€ fold0
β”‚ β”œβ”€β”€ cell_count.csv # cell-count for each image to be used in sampling
β”‚ β”œβ”€β”€ images # H&E Image for each sample as .png files
β”‚Β Β  β”œβ”€β”€ images
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ 0_0.png
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ 0_1.png
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ 0_2.png
...
β”‚ β”œβ”€β”€ labels # label as .npy arrays for each sample
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ 0_0.npy
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ 0_1.npy
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ 0_2.npy
...
β”‚ └── types.csv # csv file with type for each image
β”œβ”€β”€ fold1
β”‚ β”œβ”€β”€ cell_count.csv
β”‚ β”œβ”€β”€ images
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ 1_0.png
...
β”‚ β”œβ”€β”€ labels
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ 1_0.npy
...
β”‚ └── types.csv
β”œβ”€β”€ fold2
β”‚ β”œβ”€β”€ cell_count.csv
β”‚ β”œβ”€β”€ images
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ 2_0.png
...
β”‚ β”œβ”€β”€ labels
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ 2_0.npy
...
β”‚ └── types.csv
β”œβ”€β”€ dataset_config.yaml # dataset config with dataset information
└── weight_config.yaml # config file for our sampling
```
We provide all configuration files for the PanNuke dataset in the [`configs/datasets/PanNuke`](configs/datasets/PanNuke) folder. Please copy them in your dataset folder. Images and masks have to be extracted using the [`cell_segmentation/datasets/prepare_pannuke.py`](cell_segmentation/datasets/prepare_pannuke.py) script.