Spaces:
No application file
No application file
| # Benchmarks | |
| Here we benchmark the training speed of a Mask R-CNN in detectron2, | |
| with some other popular open source Mask R-CNN implementations. | |
| ### Settings | |
| * Hardware: 8 NVIDIA V100s with NVLink. | |
| * Software: Python 3.7, CUDA 10.1, cuDNN 7.6.5, PyTorch 1.5, | |
| TensorFlow 1.15.0rc2, Keras 2.2.5, MxNet 1.6.0b20190820. | |
| * Model: an end-to-end R-50-FPN Mask-RCNN model, using the same hyperparameter as the | |
| [Detectron baseline config](https://github.com/facebookresearch/Detectron/blob/master/configs/12_2017_baselines/e2e_mask_rcnn_R-50-FPN_1x.yaml) | |
| (it does no have scale augmentation). | |
| * Metrics: We use the average throughput in iterations 100-500 to skip GPU warmup time. | |
| Note that for R-CNN-style models, the throughput of a model typically changes during training, because | |
| it depends on the predictions of the model. Therefore this metric is not directly comparable with | |
| "train speed" in model zoo, which is the average speed of the entire training run. | |
| ### Main Results | |
| ```eval_rst | |
| +-------------------------------+--------------------+ | |
| | Implementation | Throughput (img/s) | | |
| +===============================+====================+ | |
| | |D2| |PT| | 62 | | |
| +-------------------------------+--------------------+ | |
| | mmdetection_ |PT| | 53 | | |
| +-------------------------------+--------------------+ | |
| | maskrcnn-benchmark_ |PT| | 53 | | |
| +-------------------------------+--------------------+ | |
| | tensorpack_ |TF| | 50 | | |
| +-------------------------------+--------------------+ | |
| | simpledet_ |mxnet| | 39 | | |
| +-------------------------------+--------------------+ | |
| | Detectron_ |C2| | 19 | | |
| +-------------------------------+--------------------+ | |
| | `matterport/Mask_RCNN`__ |TF| | 14 | | |
| +-------------------------------+--------------------+ | |
| .. _maskrcnn-benchmark: https://github.com/facebookresearch/maskrcnn-benchmark/ | |
| .. _tensorpack: https://github.com/tensorpack/tensorpack/tree/master/examples/FasterRCNN | |
| .. _mmdetection: https://github.com/open-mmlab/mmdetection/ | |
| .. _simpledet: https://github.com/TuSimple/simpledet/ | |
| .. _Detectron: https://github.com/facebookresearch/Detectron | |
| __ https://github.com/matterport/Mask_RCNN/ | |
| .. |D2| image:: https://github.com/facebookresearch/detectron2/raw/master/.github/Detectron2-Logo-Horz.svg?sanitize=true | |
| :height: 15pt | |
| :target: https://github.com/facebookresearch/detectron2/ | |
| .. |PT| image:: https://pytorch.org/assets/images/logo-icon.svg | |
| :width: 15pt | |
| :height: 15pt | |
| :target: https://pytorch.org | |
| .. |TF| image:: https://static.nvidiagrid.net/ngc/containers/tensorflow.png | |
| :width: 15pt | |
| :height: 15pt | |
| :target: https://tensorflow.org | |
| .. |mxnet| image:: https://github.com/dmlc/web-data/raw/master/mxnet/image/mxnet_favicon.png | |
| :width: 15pt | |
| :height: 15pt | |
| :target: https://mxnet.apache.org/ | |
| .. |C2| image:: https://caffe2.ai/static/logo.svg | |
| :width: 15pt | |
| :height: 15pt | |
| :target: https://caffe2.ai | |
| ``` | |
| Details for each implementation: | |
| * __Detectron2__: with release v0.1.2, run: | |
| ``` | |
| python tools/train_net.py --config-file configs/Detectron1-Comparisons/mask_rcnn_R_50_FPN_noaug_1x.yaml --num-gpus 8 | |
| ``` | |
| * __mmdetection__: at commit `b0d845f`, run | |
| ``` | |
| ./tools/dist_train.sh configs/mask_rcnn/mask_rcnn_r50_caffe_fpn_1x_coco.py 8 | |
| ``` | |
| * __maskrcnn-benchmark__: use commit `0ce8f6f` with `sed -i ‘s/torch.uint8/torch.bool/g’ **/*.py; sed -i 's/AT_CHECK/TORCH_CHECK/g' **/*.cu` | |
| to make it compatible with PyTorch 1.5. Then, run training with | |
| ``` | |
| python -m torch.distributed.launch --nproc_per_node=8 tools/train_net.py --config-file configs/e2e_mask_rcnn_R_50_FPN_1x.yaml | |
| ``` | |
| The speed we observed is faster than its model zoo, likely due to different software versions. | |
| * __tensorpack__: at commit `caafda`, `export TF_CUDNN_USE_AUTOTUNE=0`, then run | |
| ``` | |
| mpirun -np 8 ./train.py --config DATA.BASEDIR=/data/coco TRAINER=horovod BACKBONE.STRIDE_1X1=True TRAIN.STEPS_PER_EPOCH=50 --load ImageNet-R50-AlignPadding.npz | |
| ``` | |
| * __SimpleDet__: at commit `9187a1`, run | |
| ``` | |
| python detection_train.py --config config/mask_r50v1_fpn_1x.py | |
| ``` | |
| * __Detectron__: run | |
| ``` | |
| python tools/train_net.py --cfg configs/12_2017_baselines/e2e_mask_rcnn_R-50-FPN_1x.yaml | |
| ``` | |
| Note that many of its ops run on CPUs, therefore the performance is limited. | |
| * __matterport/Mask_RCNN__: at commit `3deaec`, apply the following diff, `export TF_CUDNN_USE_AUTOTUNE=0`, then run | |
| ``` | |
| python coco.py train --dataset=/data/coco/ --model=imagenet | |
| ``` | |
| Note that many small details in this implementation might be different | |
| from Detectron's standards. | |
| <details> | |
| <summary> | |
| (diff to make it use the same hyperparameters - click to expand) | |
| </summary> | |
| ```diff | |
| diff --git i/mrcnn/model.py w/mrcnn/model.py | |
| index 62cb2b0..61d7779 100644 | |
| --- i/mrcnn/model.py | |
| +++ w/mrcnn/model.py | |
| @@ -2367,8 +2367,8 @@ class MaskRCNN(): | |
| epochs=epochs, | |
| steps_per_epoch=self.config.STEPS_PER_EPOCH, | |
| callbacks=callbacks, | |
| - validation_data=val_generator, | |
| - validation_steps=self.config.VALIDATION_STEPS, | |
| + #validation_data=val_generator, | |
| + #validation_steps=self.config.VALIDATION_STEPS, | |
| max_queue_size=100, | |
| workers=workers, | |
| use_multiprocessing=True, | |
| diff --git i/mrcnn/parallel_model.py w/mrcnn/parallel_model.py | |
| index d2bf53b..060172a 100644 | |
| --- i/mrcnn/parallel_model.py | |
| +++ w/mrcnn/parallel_model.py | |
| @@ -32,6 +32,7 @@ class ParallelModel(KM.Model): | |
| keras_model: The Keras model to parallelize | |
| gpu_count: Number of GPUs. Must be > 1 | |
| """ | |
| + super().__init__() | |
| self.inner_model = keras_model | |
| self.gpu_count = gpu_count | |
| merged_outputs = self.make_parallel() | |
| diff --git i/samples/coco/coco.py w/samples/coco/coco.py | |
| index 5d172b5..239ed75 100644 | |
| --- i/samples/coco/coco.py | |
| +++ w/samples/coco/coco.py | |
| @@ -81,7 +81,10 @@ class CocoConfig(Config): | |
| IMAGES_PER_GPU = 2 | |
| # Uncomment to train on 8 GPUs (default is 1) | |
| - # GPU_COUNT = 8 | |
| + GPU_COUNT = 8 | |
| + BACKBONE = "resnet50" | |
| + STEPS_PER_EPOCH = 50 | |
| + TRAIN_ROIS_PER_IMAGE = 512 | |
| # Number of classes (including background) | |
| NUM_CLASSES = 1 + 80 # COCO has 80 classes | |
| @@ -496,29 +499,10 @@ if __name__ == '__main__': | |
| # *** This training schedule is an example. Update to your needs *** | |
| # Training - Stage 1 | |
| - print("Training network heads") | |
| model.train(dataset_train, dataset_val, | |
| learning_rate=config.LEARNING_RATE, | |
| epochs=40, | |
| - layers='heads', | |
| - augmentation=augmentation) | |
| - | |
| - # Training - Stage 2 | |
| - # Finetune layers from ResNet stage 4 and up | |
| - print("Fine tune Resnet stage 4 and up") | |
| - model.train(dataset_train, dataset_val, | |
| - learning_rate=config.LEARNING_RATE, | |
| - epochs=120, | |
| - layers='4+', | |
| - augmentation=augmentation) | |
| - | |
| - # Training - Stage 3 | |
| - # Fine tune all layers | |
| - print("Fine tune all layers") | |
| - model.train(dataset_train, dataset_val, | |
| - learning_rate=config.LEARNING_RATE / 10, | |
| - epochs=160, | |
| - layers='all', | |
| + layers='3+', | |
| augmentation=augmentation) | |
| elif args.command == "evaluate": | |
| ``` | |
| </details> | |