数据增强¶

增强是训练的重要部分。 Detectron2 的数据增强系统旨在解决以下目标

允许将多种数据类型一起增强（例如，图像及其边界框和掩码一起增强）
允许应用一系列静态声明的增强
允许添加自定义新的数据类型进行增强（旋转边界框、视频剪辑等）
处理和操作增强应用的**操作**

前两个功能涵盖了大多数常见用例，也存在于其他库中，例如 albumentations。支持其他功能会给 detectron2 的增强 API 增加一些开销，我们将在本教程中解释这一点。

本教程重点介绍在编写新的数据加载器时如何使用增强，以及如何编写新的增强。如果你使用 detectron2 中的默认数据加载器，它已经支持接受用户提供的自定义增强列表，如数据加载器教程中所述。

基本用法¶

功能 (1) 和 (2) 的基本用法如下所示

from detectron2.data import transforms as T
# Define a sequence of augmentations:
augs = T.AugmentationList([
    T.RandomBrightness(0.9, 1.1),
    T.RandomFlip(prob=0.5),
    T.RandomCrop("absolute", (640, 640))
])  # type: T.Augmentation

# Define the augmentation input ("image" required, others optional):
input = T.AugInput(image, boxes=boxes, sem_seg=sem_seg)
# Apply the augmentation:
transform = augs(input)  # type: T.Transform
image_transformed = input.image  # new image
sem_seg_transformed = input.sem_seg  # new semantic segmentation

# For any extra data that needs to be augmented together, use transform, e.g.:
image2_transformed = transform.apply_image(image2)
polygons_transformed = transform.apply_polygons(polygons)

这里涉及三个基本概念。它们是

T.Augmentation 定义了修改输入的**“策略”**。
- 它的 __call__(AugInput) -> Transform 方法对输入进行就地增强，并返回应用的操作
T.Transform 实现实际的**操作**来转换数据
- 它具有 apply_image、apply_coords 等方法，定义了如何转换每种数据类型
T.AugInput 存储 T.Augmentation 所需的输入以及应如何转换它们。此概念对于某些高级用法是必需的。对于所有常见用例，直接使用此类就足够了，因为不在 T.AugInput 中的额外数据可以使用返回的 transform 进行增强，如以上示例所示。

编写新的增强¶

大多数二维增强只需要了解输入图像。这种增强可以像这样轻松地实现

class MyColorAugmentation(T.Augmentation):
    def get_transform(self, image):
        r = np.random.rand(2)
        return T.ColorTransform(lambda x: x * r[0] + r[1] * 10)

class MyCustomResize(T.Augmentation):
    def get_transform(self, image):
        old_h, old_w = image.shape[:2]
        new_h, new_w = int(old_h * np.random.rand()), int(old_w * 1.5)
        return T.ResizeTransform(old_h, old_w, new_h, new_w)

augs = MyCustomResize()
transform = augs(input)

除了图像之外，还可以使用给定 AugInput 的任何属性，只要它们是函数签名的一部分，例如

class MyCustomCrop(T.Augmentation):
    def get_transform(self, image, sem_seg):
        # decide where to crop using both image and sem_seg
        return T.CropTransform(...)

augs = MyCustomCrop()
assert hasattr(input, "image") and hasattr(input, "sem_seg")
transform = augs(input)

也可以通过子类化 T.Transform 来添加新的转换操作。

高级用法¶

我们给出了几个由我们的系统启用的高级用法的示例。这些选项对于新的研究可能很有趣，尽管在标准用例中通常不需要更改它们。

自定义转换策略¶

detectron2 的 Augmentation 除了返回增强后的数据外，还会返回**操作**作为 T.Transform。这允许用户在其数据上应用自定义转换策略。我们以关键点数据为例。

关键点是 (x, y) 坐标，但由于它们所携带的语义含义，它们并不容易增强。这种含义只有用户才知道，因此用户可能希望通过查看返回的 transform 手动增强它们。例如，当图像水平翻转时，我们希望交换“左眼”和“右眼”的关键点注释。这可以通过以下方式完成（在 detectron2 的默认数据加载器中默认包含）：

# augs, input are defined as in previous examples
transform = augs(input)  # type: T.Transform
keypoints_xy = transform.apply_coords(keypoints_xy)   # transform the coordinates

# get a list of all transforms that were applied
transforms = T.TransformList([transform]).transforms
# check if it is flipped for odd number of times
do_hflip = sum(isinstance(t, T.HFlipTransform) for t in transforms) % 2 == 1
if do_hflip:
    keypoints_xy = keypoints_xy[flip_indices_mapping]

另一个例子是，关键点注释通常有一个“可见性”字段。一系列增强可能会将可见的关键点从图像边界之外增强（例如，通过裁剪），但随后将它们带回到边界内（例如，通过图像填充）。如果用户决定将此类关键点标记为“不可见”，则必须在每个转换步骤之后进行可见性检查。这可以通过以下方式实现：

transform = augs(input)  # type: T.TransformList
assert isinstance(transform, T.TransformList)
for t in transform.transforms:
    keypoints_xy = t.apply_coords(keypoints_xy)
    visibility &= (keypoints_xy >= [0, 0] & keypoints_xy <= [W, H]).all(axis=1)

# btw, detectron2's `transform_keypoint_annotations` function chooses to label such keypoints "visible":
# keypoints_xy = transform.apply_coords(keypoints_xy)
# visibility &= (keypoints_xy >= [0, 0] & keypoints_xy <= [W, H]).all(axis=1)

几何反转转换¶

如果图像在推理之前被增强预处理，则预测结果（如分割掩码）将定位在增强后的图像上。我们希望使用 inverse() API 反转应用的增强，以获得原始图像上的结果

transform = augs(input)
pred_mask = make_prediction(input.image)
inv_transform = transform.inverse()
pred_mask_orig = inv_transform.apply_segmentation(pred_mask)

添加新的数据类型¶

T.Transform 支持几种常见的要转换的数据类型，包括图像、坐标、掩码、框、多边形。它允许注册新的数据类型，例如

@T.HFlipTransform.register_type("rotated_boxes")
def func(flip_transform: T.HFlipTransform, rotated_boxes: Any):
    # do the work
    return flipped_rotated_boxes

t = HFlipTransform(width=800)
transformed_rotated_boxes = t.apply_rotated_boxes(rotated_boxes)  # func will be called

扩展 T.AugInput¶

增强只能访问给定输入中可用的属性。 T.AugInput 定义了“image”、“boxes”、“sem_seg”，这些对于常见的增强策略来说足以决定如何增强。如果不是，则需要自定义实现。

通过重新实现 AugInput 中的“transform()”方法，还可以以相互依赖的方式增强不同的字段。这种用例并不常见（例如，根据增强的掩码后处理边界框），但系统允许。