No application file
No application file
# Use Custom Datasets | |
Datasets that have builtin support in detectron2 are listed in [datasets](../../datasets). | |
If you want to use a custom dataset while also reusing detectron2's data loaders, | |
you will need to | |
1. __Register__ your dataset (i.e., tell detectron2 how to obtain your dataset). | |
2. Optionally, __register metadata__ for your dataset. | |
Next, we explain the above two concepts in detail. | |
The [Colab tutorial]( | |
has a live example of how to register and train on a dataset of custom formats. | |
### Register a Dataset | |
To let detectron2 know how to obtain a dataset named "my_dataset", you will implement | |
a function that returns the items in your dataset and then tell detectron2 about this | |
function: | |
```python | |
def my_dataset_function(): | |
... | |
return list[dict] in the following format | |
from import DatasetCatalog | |
DatasetCatalog.register("my_dataset", my_dataset_function) | |
``` | |
Here, the snippet associates a dataset "my_dataset" with a function that returns the data. | |
The function must return the same data if called multiple times. | |
The registration stays effective until the process exits. | |
The function can process data from its original format into either one of the following: | |
1. Detectron2's standard dataset dict, described below. This will work with many other builtin | |
features in detectron2, so it's recommended to use it when it's sufficient for your task. | |
2. Your custom dataset dict. You can also return arbitrary dicts in your own format, | |
such as adding extra keys for new tasks. | |
Then you will need to handle them properly downstream as well. | |
See below for more details. | |
#### Standard Dataset Dicts | |
For standard tasks | |
(instance detection, instance/semantic/panoptic segmentation, keypoint detection), | |
we load the original dataset into `list[dict]` with a specification similar to COCO's json annotations. | |
This is our standard representation for a dataset. | |
Each dict contains information about one image. | |
The dict may have the following fields, | |
and the required fields vary based on what the dataloader or the task needs (see more below). | |
+ `file_name`: the full path to the image file. Will apply rotation and flipping if the image has such exif information. | |
+ `height`, `width`: integer. The shape of image. | |
+ `image_id` (str or int): a unique id that identifies this image. Used | |
during evaluation to identify the images, but a dataset may use it for different purposes. | |
+ `annotations` (list[dict]): each dict corresponds to annotations of one instance | |
in this image. Required by instance detection/segmentation or keypoint detection tasks. | |
Images with empty `annotations` will by default be removed from training, | |
but can be included using `DATALOADER.FILTER_EMPTY_ANNOTATIONS`. | |
Each dict contains the following keys, of which `bbox`,`bbox_mode` and `category_id` are required: | |
+ `bbox` (list[float]): list of 4 numbers representing the bounding box of the instance. | |
+ `bbox_mode` (int): the format of bbox. | |
It must be a member of | |
[structures.BoxMode](../modules/structures.html#detectron2.structures.BoxMode). | |
Currently supports: `BoxMode.XYXY_ABS`, `BoxMode.XYWH_ABS`. | |
+ `category_id` (int): an integer in the range [0, num_categories-1] representing the category label. | |
The value num_categories is reserved to represent the "background" category, if applicable. | |
+ `segmentation` (list[list[float]] or dict): the segmentation mask of the instance. | |
+ If `list[list[float]]`, it represents a list of polygons, one for each connected component | |
of the object. Each `list[float]` is one simple polygon in the format of `[x1, y1, ..., xn, yn]`. | |
The Xs and Ys are absolute coordinates in unit of pixels. | |
+ If `dict`, it represents the per-pixel segmentation mask in COCO's compressed RLE format. | |
The dict should have keys "size" and "counts". You can convert a uint8 segmentation mask of 0s and | |
1s into such dict by `pycocotools.mask.encode(np.asarray(mask, order="F"))`. | |
`cfg.INPUT.MASK_FORMAT` must be set to `bitmask` if using the default data loader with such format. | |
+ `keypoints` (list[float]): in the format of [x1, y1, v1,..., xn, yn, vn]. | |
v[i] means the [visibility]( of this keypoint. | |
`n` must be equal to the number of keypoint categories. | |
The Xs and Ys are either relative coordinates in [0, 1], or absolute coordinates, | |
depend on whether "bbox_mode" is relative. | |
Note that the coordinate annotations in COCO format are integers in range [0, H-1 or W-1]. | |
By default, detectron2 adds 0.5 to absolute keypoint coordinates to convert them from discrete | |
pixel indices to floating point coordinates. | |
+ `iscrowd`: 0 (default) or 1. Whether this instance is labeled as COCO's "crowd | |
region". Don't include this field if you don't know what it means. | |
+ `sem_seg_file_name`: the full path to the ground truth semantic segmentation file. | |
Required by semantic segmentation task. | |
It should be an image whose pixel values are integer labels. | |
Fast R-CNN (with precomputed proposals) is rarely used today. | |
To train a Fast R-CNN, the following extra keys are needed: | |
+ `proposal_boxes` (array): 2D numpy array with shape (K, 4) representing K precomputed proposal boxes for this image. | |
+ `proposal_objectness_logits` (array): numpy array with shape (K, ), which corresponds to the objectness | |
logits of proposals in 'proposal_boxes'. | |
+ `proposal_bbox_mode` (int): the format of the precomputed proposal bbox. | |
It must be a member of | |
[structures.BoxMode](../modules/structures.html#detectron2.structures.BoxMode). | |
Default is `BoxMode.XYXY_ABS`. | |
#### Custom Dataset Dicts for New Tasks | |
In the `list[dict]` that your dataset function returns, the dictionary can also have arbitrary custom data. | |
This will be useful for a new task that needs extra information not supported | |
by the standard dataset dicts. In this case, you need to make sure the downstream code can handle your data | |
correctly. Usually this requires writing a new `mapper` for the dataloader (see [Use Custom Dataloaders](./ | |
When designing a custom format, note that all dicts are stored in memory | |
(sometimes serialized and with multiple copies). | |
To save memory, each dict is meant to contain small but sufficient information | |
about each sample, such as file names and annotations. | |
Loading full samples typically happens in the data loader. | |
For attributes shared among the entire dataset, use `Metadata` (see below). | |
To avoid extra memory, do not save such information repeatly for each sample. | |
### "Metadata" for Datasets | |
Each dataset is associated with some metadata, accessible through | |
`MetadataCatalog.get(dataset_name).some_metadata`. | |
Metadata is a key-value mapping that contains information that's shared among | |
the entire dataset, and usually is used to interpret what's in the dataset, e.g., | |
names of classes, colors of classes, root of files, etc. | |
This information will be useful for augmentation, evaluation, visualization, logging, etc. | |
The structure of metadata depends on the what is needed from the corresponding downstream code. | |
If you register a new dataset through `DatasetCatalog.register`, | |
you may also want to add its corresponding metadata through | |
`MetadataCatalog.get(dataset_name).some_key = some_value`, to enable any features that need the metadata. | |
You can do it like this (using the metadata key "thing_classes" as an example): | |
```python | |
from import MetadataCatalog | |
MetadataCatalog.get("my_dataset").thing_classes = ["person", "dog"] | |
``` | |
Here is a list of metadata keys that are used by builtin features in detectron2. | |
If you add your own dataset without these metadata, some features may be | |
unavailable to you: | |
* `thing_classes` (list[str]): Used by all instance detection/segmentation tasks. | |
A list of names for each instance/thing category. | |
If you load a COCO format dataset, it will be automatically set by the function `load_coco_json`. | |
* `thing_colors` (list[tuple(r, g, b)]): Pre-defined color (in [0, 255]) for each thing category. | |
Used for visualization. If not given, random colors are used. | |
* `stuff_classes` (list[str]): Used by semantic and panoptic segmentation tasks. | |
A list of names for each stuff category. | |
* `stuff_colors` (list[tuple(r, g, b)]): Pre-defined color (in [0, 255]) for each stuff category. | |
Used for visualization. If not given, random colors are used. | |
* `keypoint_names` (list[str]): Used by keypoint localization. A list of names for each keypoint. | |
* `keypoint_flip_map` (list[tuple[str]]): Used by the keypoint localization task. A list of pairs of names, | |
where each pair are the two keypoints that should be flipped if the image is | |
flipped horizontally during augmentation. | |
* `keypoint_connection_rules`: list[tuple(str, str, (r, g, b))]. Each tuple specifies a pair of keypoints | |
that are connected and the color to use for the line between them when visualized. | |
Some additional metadata that are specific to the evaluation of certain datasets (e.g. COCO): | |
* `thing_dataset_id_to_contiguous_id` (dict[int->int]): Used by all instance detection/segmentation tasks in the COCO format. | |
A mapping from instance class ids in the dataset to contiguous ids in range [0, #class). | |
Will be automatically set by the function `load_coco_json`. | |
* `stuff_dataset_id_to_contiguous_id` (dict[int->int]): Used when generating prediction json files for | |
semantic/panoptic segmentation. | |
A mapping from semantic segmentation class ids in the dataset | |
to contiguous ids in [0, num_categories). It is useful for evaluation only. | |
* `json_file`: The COCO annotation json file. Used by COCO evaluation for COCO-format datasets. | |
* `panoptic_root`, `panoptic_json`: Used by panoptic evaluation. | |
* `evaluator_type`: Used by the builtin main training script to select | |
evaluator. Don't use it in a new training script. | |
You can just provide the [DatasetEvaluator](../modules/evaluation.html#detectron2.evaluation.DatasetEvaluator) | |
for your dataset directly in your main script. | |
NOTE: For background on the concept of "thing" and "stuff", see | |
[On Seeing Stuff: The Perception of Materials by Humans and Machines]( | |
In detectron2, the term "thing" is used for instance-level tasks, | |
and "stuff" is used for semantic segmentation tasks. | |
Both are used in panoptic segmentation. | |
### Register a COCO Format Dataset | |
If your dataset is already a json file in the COCO format, | |
the dataset and its associated metadata can be registered easily with: | |
```python | |
from import register_coco_instances | |
register_coco_instances("my_dataset", {}, "json_annotation.json", "path/to/image/dir") | |
``` | |
If your dataset is in COCO format but with extra custom per-instance annotations, | |
the [load_coco_json](../modules/ | |
function might be useful. | |
### Update the Config for New Datasets | |
Once you've registered the dataset, you can use the name of the dataset (e.g., "my_dataset" in | |
example above) in `cfg.DATASETS.{TRAIN,TEST}`. | |
There are other configs you might want to change to train or evaluate on new datasets: | |
* `MODEL.ROI_HEADS.NUM_CLASSES` and `MODEL.RETINANET.NUM_CLASSES` are the number of thing classes | |
for R-CNN and RetinaNet models, respectively. | |
* `MODEL.ROI_KEYPOINT_HEAD.NUM_KEYPOINTS` sets the number of keypoints for Keypoint R-CNN. | |
You'll also need to set [Keypoint OKS]( | |
with `TEST.KEYPOINT_OKS_SIGMAS` for evaluation. | |
* `MODEL.SEM_SEG_HEAD.NUM_CLASSES` sets the number of stuff classes for Semantic FPN & Panoptic FPN. | |
* If you're training Fast R-CNN (with precomputed proposals), `DATASETS.PROPOSAL_FILES_{TRAIN,TEST}` | |
need to match the datasets. The format of proposal files are documented | |
[here](../modules/ | |
New models | |
(e.g. [TensorMask](../../projects/TensorMask), | |
[PointRend](../../projects/PointRend)) | |
often have similar configs of their own that need to be changed as well. | |