基准测试¶

在这里，我们对 detectron2 中 Mask R-CNN 的训练速度进行基准测试，并与其他一些流行的开源 Mask R-CNN 实现进行比较。

设置¶

硬件：8 个带 NVLink 的 NVIDIA V100。
软件：Python 3.7、CUDA 10.1、cuDNN 7.6.5、PyTorch 1.5、TensorFlow 1.15.0rc2、Keras 2.2.5、MxNet 1.6.0b20190820。
模型：端到端 R-50-FPN Mask-RCNN 模型，使用与 Detectron 基准配置相同的超参数（它没有尺度增强）。
指标：我们使用迭代 100-500 中的平均吞吐量来跳过 GPU 预热时间。请注意，对于 R-CNN 样式的模型，模型的吞吐量通常会在训练过程中发生变化，因为这取决于模型的预测。因此，此指标不可与模型库中的“训练速度”直接比较，而“训练速度”是整个训练运行的平均速度。

主要结果¶

实现	吞吐量 (img/s)
	62
mmdetection	53
maskrcnn-benchmark	53
tensorpack	50
simpledet	39
Detectron	19
matterport/Mask_RCNN	14

每个实现的详细信息

Detectron2：使用版本 v0.1.2，运行

python tools/train_net.py  --config-file configs/Detectron1-Comparisons/mask_rcnn_R_50_FPN_noaug_1x.yaml --num-gpus 8

mmdetection：在提交 b0d845f，运行

./tools/dist_train.sh configs/mask_rcnn/mask_rcnn_r50_caffe_fpn_1x_coco.py 8

maskrcnn-benchmark：使用提交 0ce8f6f 并使用 sed -i 's/torch.uint8/torch.bool/g' **/*.py; sed -i 's/AT_CHECK/TORCH_CHECK/g' **/*.cu 使其与 PyTorch 1.5 兼容。然后，使用以下命令运行训练
```
python -m torch.distributed.launch --nproc_per_node=8 tools/train_net.py --config-file configs/e2e_mask_rcnn_R_50_FPN_1x.yaml
```
我们观察到的速度快于其模型库，这可能是由于不同的软件版本造成的。

tensorpack：在提交 caafda，export TF_CUDNN_USE_AUTOTUNE=0，然后运行

mpirun -np 8 ./train.py --config DATA.BASEDIR=/data/coco TRAINER=horovod BACKBONE.STRIDE_1X1=True TRAIN.STEPS_PER_EPOCH=50 --load ImageNet-R50-AlignPadding.npz

SimpleDet：在提交 9187a1，运行

python detection_train.py --config config/mask_r50v1_fpn_1x.py

Detectron：运行
```
python tools/train_net.py --cfg configs/12_2017_baselines/e2e_mask_rcnn_R-50-FPN_1x.yaml
```
请注意，其许多操作都在 CPU 上运行，因此性能受到限制。

matterport/Mask_RCNN：在提交 3deaec，应用以下差异，export TF_CUDNN_USE_AUTOTUNE=0，然后运行

python coco.py train --dataset=/data/coco/ --model=imagenet

请注意，此实现中的许多小细节可能与 Detectron 的标准不同。

(使它使用相同超参数的差异 - 点击展开)

diff --git i/mrcnn/model.py w/mrcnn/model.py
index 62cb2b0..61d7779 100644
--- i/mrcnn/model.py
+++ w/mrcnn/model.py
@@ -2367,8 +2367,8 @@ class MaskRCNN():
      epochs=epochs,
      steps_per_epoch=self.config.STEPS_PER_EPOCH,
      callbacks=callbacks,
-            validation_data=val_generator,
-            validation_steps=self.config.VALIDATION_STEPS,
+            #validation_data=val_generator,
+            #validation_steps=self.config.VALIDATION_STEPS,
      max_queue_size=100,
      workers=workers,
      use_multiprocessing=True,
diff --git i/mrcnn/parallel_model.py w/mrcnn/parallel_model.py
index d2bf53b..060172a 100644
--- i/mrcnn/parallel_model.py
+++ w/mrcnn/parallel_model.py
@@ -32,6 +32,7 @@ class ParallelModel(KM.Model):
    keras_model: The Keras model to parallelize
    gpu_count: Number of GPUs. Must be > 1
    """
+        super().__init__()
    self.inner_model = keras_model
    self.gpu_count = gpu_count
    merged_outputs = self.make_parallel()
diff --git i/samples/coco/coco.py w/samples/coco/coco.py
index 5d172b5..239ed75 100644
--- i/samples/coco/coco.py
+++ w/samples/coco/coco.py
@@ -81,7 +81,10 @@ class CocoConfig(Config):
  IMAGES_PER_GPU = 2

  # Uncomment to train on 8 GPUs (default is 1)
-    # GPU_COUNT = 8
+    GPU_COUNT = 8
+    BACKBONE = "resnet50"
+    STEPS_PER_EPOCH = 50
+    TRAIN_ROIS_PER_IMAGE = 512

  # Number of classes (including background)
  NUM_CLASSES = 1 + 80  # COCO has 80 classes
@@ -496,29 +499,10 @@ if __name__ == '__main__':
    # *** This training schedule is an example. Update to your needs ***

    # Training - Stage 1
-        print("Training network heads")
    model.train(dataset_train, dataset_val,
          learning_rate=config.LEARNING_RATE,
          epochs=40,
-                    layers='heads',
-                    augmentation=augmentation)
-
-        # Training - Stage 2
-        # Finetune layers from ResNet stage 4 and up
-        print("Fine tune Resnet stage 4 and up")
-        model.train(dataset_train, dataset_val,
-                    learning_rate=config.LEARNING_RATE,
-                    epochs=120,
-                    layers='4+',
-                    augmentation=augmentation)
-
-        # Training - Stage 3
-        # Fine tune all layers
-        print("Fine tune all layers")
-        model.train(dataset_train, dataset_val,
-                    learning_rate=config.LEARNING_RATE / 10,
-                    epochs=160,
-                    layers='all',
+                    layers='3+',
          augmentation=augmentation)

  elif args.command == "evaluate":