detectron2.structures¶

class detectron2.structures.Boxes(tensor: torch.Tensor)¶

Bases: object

此结构将框列表存储为 Nx4 torch.Tensor。它支持一些关于框的常用方法（area、clip、nonempty 等），并且也像张量一样工作（支持索引、to(device)、.device 以及对所有框的迭代）。

tensor¶

Nx4 的浮点数矩阵。每行是 (x1, y1, x2, y2)。

类型: torch.Tensor

__getitem__(item) → detectron2.structures.Boxes ¶

参数: item – int、slice 或 BoolTensor
返回值: Boxes – 通过索引创建新的 Boxes。

允许以下用法

new_boxes = boxes[3]: 返回一个仅包含一个框的 Boxes。
new_boxes = boxes[2:10]: 返回框的切片。
new_boxes = boxes[vector]，其中 vector 是一个 length = len(boxes) 的 torch.BoolTensor。向量中非零元素将被选中。

请注意，返回的 Boxes 可能与这个 Boxes 共享存储，这取决于 Pytorch 的索引语义。

__init__(tensor: torch.Tensor)¶

参数: tensor (Tensor[float]) – Nx4 矩阵。每行是 (x1, y1, x2, y2)。

__iter__()¶: 一次生成一个形状为 (4,) 的张量框。

area() → torch.Tensor ¶

计算所有框的面积。

返回值: torch.Tensor – 包含每个框面积的向量。

classmethod cat(boxes_list: List[Boxes]) → detectron2.structures.Boxes [source]¶

将 Boxes 列表连接到单个 Boxes

参数: boxes_list (list[Boxes]) –
返回值: Boxes – 连接的 Boxes

clip(box_size: Tuple[int, int]) → None ¶

通过将 x 坐标限制在 [0, width] 范围内，将 y 坐标限制在 [0, height] 范围内，来裁剪（就地）框。

参数: box_size (height, width) – 裁剪框的大小。

clone() → detectron2.structures.Boxes ¶

克隆 Boxes。

返回值: Boxes

property device¶

get_centers() → torch.Tensor ¶

返回值: 框中心，以 Nx2 数组 (x, y) 表示。

inside_box(box_size: Tuple[int, int], boundary_threshold: int = 0) → torch.Tensor ¶

参数

box_size (height, width) – 参考框的大小。
boundary_threshold (int) – 超出参考框边界超过 boundary_threshold 的框被认为是“外部”的。

返回值

二进制向量，指示每个框是否在参考框内。

nonempty(threshold: float = 0.0) → torch.Tensor ¶

查找非空的框。如果框的任何边都小于阈值，则该框被认为为空。

返回值: Tensor – 一个二进制向量，表示每个框是否为空（False）或非空（True）。

scale(scale_x: float, scale_y: float) → None ¶: 使用水平和垂直缩放因子缩放框。

to(device: torch.device)¶

class detectron2.structures.BoxMode(value)¶

Bases: enum.IntEnum

表示框的不同方式的枚举。

XYXY_ABS = 0¶

XYWH_ABS = 1¶

XYXY_REL = 2¶

XYWH_REL = 3¶

XYWHA_ABS = 4¶

static convert(box: Union[List[float], Tuple[float, …], torch.Tensor, numpy.ndarray], from_mode: detectron2.structures.BoxMode, to_mode: detectron2.structures.BoxMode) → Union[List[float], Tuple[float, …], torch.Tensor, numpy.ndarray][source]¶

参数

box – 可以是 k 元组、k 列表或 Nxk 数组/张量，其中 k = 4 或 5
from_mode (BoxMode) –
to_mode (BoxMode) –

返回值

转换后的相同类型框。

detectron2.structures.pairwise_iou(boxes1: detectron2.structures.Boxes, boxes2: detectron2.structures.Boxes) → torch.Tensor ¶

给定两个大小分别为 N 和 M 的框列表，计算所有 N x M 对框之间的 IoU（交并比）。框顺序必须为（xmin，ymin，xmax，ymax）。

参数

boxes1 (Boxes) – 两个 Boxes。分别包含 N 和 M 个框。
boxes2 (Boxes) – 两个 Boxes。分别包含 N 和 M 个框。

返回值

张量 – IoU，大小为 [N,M]。

detectron2.structures.pairwise_ioa(boxes1: detectron2.structures.Boxes, boxes2: detectron2.structures.Boxes) → torch.Tensor ¶

类似于 pariwise_iou()，但计算 IoA（交集除以 boxes2 面积）。

参数

boxes1 (Boxes) – 两个 Boxes。分别包含 N 和 M 个框。
boxes2 (Boxes) – 两个 Boxes。分别包含 N 和 M 个框。

返回值

张量 – IoA，大小为 [N,M]。

detectron2.structures.pairwise_point_box_distance(points: torch.Tensor, boxes: detectron2.structures.Boxes)¶

N 个点和 M 个框之间的成对距离。点和框之间的距离由点到框的 4 条边的距离表示。当点在框内时，所有距离都为正。

参数

points – Nx2 坐标。每行是 (x, y)
boxes – M 个框

返回值

张量 –

大小为 (N, M, 4) 的距离。4 个值分别是: 点到框的左、上、右、下边的距离。

class detectron2.structures.ImageList(tensor: torch.Tensor, image_sizes: List[Tuple[int, int]])¶

Bases: object

结构，将图像列表（可能具有不同的尺寸）作为单个张量保存。这是通过将图像填充到相同大小来实现的。每个图像的原始大小存储在 image_sizes 中。

image_sizes¶

每个元组是 (h, w)。在跟踪期间，它变为 list[Tensor]。

类型: list[tuple[int, int]]

__getitem__(idx) → torch.Tensor ¶

访问其原始大小的单个图像。

参数: idx – int 或 slice
返回值: 张量 – 形状为 (H, W) 或 (C_1, …, C_K, H, W) 的图像，其中 K >= 1

__init__(tensor: torch.Tensor, image_sizes: List[Tuple[int, int]])¶

参数

tensor (张量) – 形状为 (N, H, W) 或 (N, C_1, …, C_K, H, W) 的张量，其中 K >= 1
image_sizes (list[tuple[int, int]]) – 每个元组是 (h, w)。由于填充，它可能小于 (H, W)。

property device¶

static from_tensors(tensors: List[torch.Tensor], size_divisibility: int = 0, pad_value: float = 0.0, padding_constraints: Optional[Dict[str, int]] = None) → detectron2.structures.ImageList [source]¶

参数

tensors – torch.Tensor 的元组或列表，每个张量的形状为 (Hi, Wi) 或 (C_1, …, C_K, Hi, Wi)，其中 K >= 1。张量将使用 pad_value 填充到相同的形状。
size_divisibility (int) – 如果 size_divisibility > 0，则添加填充以确保公共高度和宽度可被 size_divisibility 整除。这取决于模型，许多模型需要 32 的可除性。
pad_value (float) – 要填充的值。
padding_constraints (optional[Dict]) – 如果给出，它将遵循以下格式：{“size_divisibility”: int, “square_size”: int}，其中 size_divisibility 将覆盖上述值（如果存在），而 square_size 表示如果 square_size > 0 则方形填充的大小。

返回值

一个 ImageList。

to(*args: Any, **kwargs: Any) → detectron2.structures.ImageList ¶

class detectron2.structures.Instances(image_size: Tuple[int, int], **kwargs: Any)¶

Bases: object

此类表示图像中的一系列实例。它将实例的属性（例如，边界框、掩码、标签、分数）存储为“字段”。所有字段都必须具有相同的 __len__，即实例数量。

此类的所有其他（非字段）属性都被视为私有：它们必须以“_”开头，用户无法修改。

一些基本用法

设置/获取/检查字段

instances.gt_boxes = Boxes(...)
print(instances.pred_masks)  # a tensor of shape (N, H, W)
print('gt_masks' in instances)

len(instances) 返回实例的数量
索引：instances[indices] 将对所有字段应用索引，并返回一个新的 Instances。通常，indices 是一个包含索引的整数向量，或者是一个长度为 num_instances 的二进制掩码
```
category_3_detections = instances[instances.pred_classes == 3]
confident_detections = instances[instances.scores > 0.9]
```

__getitem__(item: Union[int, slice, torch.BoolTensor]) → detectron2.structures.Instances ¶

参数: item – 类似索引的对象，将用于对所有字段进行索引。
返回值: 如果 item 是一个字符串，则返回相应字段中的数据。否则，返回一个 Instances，其中所有字段都由 item 索引。

__init__(image_size: Tuple[int, int], **kwargs: Any)¶

参数

image_size (height, width) – 图像的空间大小。
kwargs – 要添加到此 Instances 的字段。

static cat(instance_lists: List[Instances]) → detectron2.structures.Instances [source]¶

参数: instance_lists (list[Instances]) –
返回值: Instances

get(name: str) → Any¶: 返回名为 name 的字段。

get_fields() → Dict[str, Any]¶

返回值: dict – 一个将名称（str）映射到字段数据的字典

修改返回的字典将修改此实例。

has(name: str) → bool ¶

返回值: bool – 是否存在名为 name 的字段。

property image_size¶: 返回：元组：高度、宽度

remove(name: str) → None ¶: 删除名为 name 的字段。

set(name: str, value: Any) → None ¶: 将名为 name 的字段设置为 value。 value 的长度必须是实例的数量，并且必须与该对象中其他现有字段一致。

to(*args: Any, **kwargs: Any) → detectron2.structures.Instances ¶

返回值: Instances – 所有字段都将调用 to(device)，如果该字段具有此方法。

class detectron2.structures.Keypoints(keypoints: Union[torch.Tensor, numpy.ndarray, List[List[float]]])¶

Bases: object

存储关键点标注数据。GT 实例具有一个 gt_keypoints 属性，包含每个关键点的 x、y 位置和可见性标志。此张量形状为 (N, K, 3)，其中 N 是实例数量，K 是每个实例的关键点数量。

可见性标志遵循 COCO 格式，必须是三个整数之一

v=0：未标记（在这种情况下 x=y=0）
v=1：已标记但不可见
v=2：已标记且可见

__getitem__(item: Union[int, slice, torch.BoolTensor]) → detectron2.structures.Keypoints ¶

通过对该 Keypoints 进行索引来创建一个新的 Keypoints。

允许以下用法

new_kpts = kpts[3]：返回一个仅包含一个实例的 Keypoints。
new_kpts = kpts[2:10]：返回关键点的一个切片。
new_kpts = kpts[vector]，其中 vector 是一个 length = len(kpts) 的 torch.ByteTensor。向量中的非零元素将被选中。

请注意，返回的 Keypoints 可能与该 Keypoints 共享存储，这取决于 Pytorch 的索引语义。

__init__(keypoints: Union[torch.Tensor, numpy.ndarray, List[List[float]]])¶

参数: keypoints – 张量、numpy 数组或包含每个关键点的 x、y 和可见性的列表。形状应为 (N, K, 3)，其中 N 是实例数量，K 是每个实例的关键点数量。

static cat(keypoints_list: List[Keypoints]) → detectron2.structures.Keypoints [source]¶

将 Keypoints 列表连接成单个 Keypoints

参数: keypoints_list (list[Keypoints]) –
返回值: Keypoints – 连接的 Keypoints

property device¶

to(*args: Any, **kwargs: Any) → detectron2.structures.Keypoints ¶

to_heatmap(boxes: torch.Tensor, heatmap_size: int) → torch.Tensor ¶

将关键点标注转换为用于训练的一热标签热图，如 Mask R-CNN 中所述。

参数

boxes – Nx4 张量，用于绘制关键点的框

返回值

热图 – 形状为 (N, K) 的张量，每个元素都是输入中每个关键点的整数空间标签。: 在 [0, heatmap_size**2 - 1] 范围内。
有效: 形状为 (N, K) 的张量，包含每个关键点是否在 roi 中。

detectron2.structures.heatmaps_to_keypoints(maps: torch.Tensor, rois: torch.Tensor) → torch.Tensor ¶

从热图中提取预测的关键点位置。

参数

maps (张量) – (#ROIs, #keypoints, POOL_H, POOL_W)。每个 ROI 和每个关键点的预测 logits 热图。
rois (张量) – (#ROIs, 4)。每个 ROI 的框。

返回值

形状为 (#ROIs, #keypoints, 4) 的张量，最后一维对应于每个关键点的 (x, y, logit, score)。

在将 NxN 图像中的离散像素索引转换为连续的关键点坐标时，我们通过使用 Heckbert 1990 年的转换来保持与 Keypoints.to_heatmap() 的一致性：c = d + 0.5，其中 d 是离散坐标，c 是连续坐标。

class detectron2.structures.BitMasks(tensor: Union[torch.Tensor, numpy.ndarray])¶

Bases: object

此类以位图的形式存储图像中所有对象的分割掩码。

tensor¶: N,H,W 的布尔张量，表示图像中的 N 个实例。

__getitem__(item: Union[int, slice, torch.BoolTensor]) → detectron2.structures.BitMasks ¶

返回值: BitMasks – 通过索引创建新的 BitMasks。

允许以下用法

new_masks = masks[3]: 返回一个只包含一个掩码的 BitMasks。
new_masks = masks[2:10]: 返回一个掩码切片。
new_masks = masks[vector]，其中 vector 是一个 torch.BoolTensor，length = len(masks)。向量中的非零元素将被选中。

请注意，返回的对象可能与该对象共享存储空间，这取决于 Pytorch 的索引语义。

__init__(tensor: Union[torch.Tensor, numpy.ndarray])¶

参数: tensor – N,H,W 的布尔张量，表示图像中的 N 个实例。

static cat(bitmasks_list: List[BitMasks]) → detectron2.structures.BitMasks [source]¶

将 BitMasks 列表连接成一个 BitMasks。

参数: bitmasks_list (list[BitMasks]) –
返回值: BitMasks – 连接的 BitMasks

crop_and_resize(boxes: torch.Tensor, mask_size: int) → torch.Tensor ¶

通过给定的框裁剪每个位图，并将结果调整大小为 (mask_size, mask_size)。这可用于准备 Mask R-CNN 的训练目标。与多边形光栅化相比，它的重建误差更小。但是，我们没有观察到精度差异，但 BitMasks 需要更多的内存来存储所有掩码。

参数

boxes (张量) – Nx4 张量，存储每个掩码的框
mask_size (int) – 光栅化掩码的大小。

返回值

张量 – 形状为 (N, mask_size, mask_size) 的布尔张量，其中 N 是此图像的预测框数量。

property device¶

static from_polygon_masks(polygon_masks: Union[PolygonMasks, List[List[numpy.ndarray]]], height: int, width: int) → detectron2.structures.BitMasks [source]¶

参数

polygon_masks (list[list[ndarray]] or PolygonMasks) –
height (int) –
width (int) –

static from_roi_masks(roi_masks: detectron2.structures.ROIMasks, height: int, width: int) → detectron2.structures.BitMasks [source]¶

参数

roi_masks –
height (int) –
width (int) –

get_bounding_boxes() → detectron2.structures.Boxes ¶

返回值: Boxes – 位图周围的紧密边界框。如果掩码为空，则其边界框将全为零。

nonempty() → torch.Tensor ¶

查找非空的掩码。

返回值

张量 –

一个 BoolTensor，它表示: 每个掩码是否为空 (False) 或非空 (True)。

to(*args: Any, **kwargs: Any) → detectron2.structures.BitMasks ¶

class detectron2.structures.PolygonMasks(polygons: List[List[Union[torch.Tensor, numpy.ndarray]]])¶

Bases: object

此类以多边形形式存储图像中所有对象的分割掩码。

polygons¶: list[list[ndarray]]。每个 ndarray 都是一个表示多边形的 float64 向量。

__getitem__(item: Union[int, slice, List[int], torch.BoolTensor]) → detectron2.structures.PolygonMasks ¶

支持对实例进行索引，并返回一个 PolygonMasks 对象。 item 可以是

整数。它将返回一个只有一个实例的对象。
切片。它将返回一个包含所选实例的对象。
一个 list[int]。它将返回一个包含所选实例的对象，对应于列表中的索引。
一个类型为 BoolTensor 的向量掩码，其长度为 num_instances。它将返回一个包含掩码非零的实例的对象。

__init__(polygons: List[List[Union[torch.Tensor, numpy.ndarray]]])¶

参数: polygons (list[list[np.ndarray]]) – 列表的第一级对应于单个实例，第二级对应于构成实例的所有多边形，第三级对应于多边形坐标。第三级数组应具有 [x0, y0, x1, y1, …, xn, yn] (n >= 3) 的格式。

__iter__() → Iterator[List[numpy.ndarray]]¶

生成: list[ndarray] – 一个实例的多边形。每个 Tensor 是一个表示多边形的浮点数向量。

area()¶

计算掩码的面积。仅适用于多边形，使用鞋带公式： https://stackoverflow.com/questions/24467972/calculate-area-of-polygon-given-x-y-coordinates

返回值: Tensor – 一个向量，每个实例的面积

static cat(polymasks_list: List[PolygonMasks]) → detectron2.structures.PolygonMasks [source]¶

将 PolygonMasks 的列表连接成一个 PolygonMasks

参数: polymasks_list (list[PolygonMasks]) –
返回值: PolygonMasks – 连接后的 PolygonMasks

crop_and_resize(boxes: torch.Tensor, mask_size: int) → torch.Tensor ¶

通过给定框裁剪每个掩码，并将结果调整为 (mask_size, mask_size)。这可以用于准备 Mask R-CNN 的训练目标。

参数

boxes (张量) – Nx4 张量，存储每个掩码的框
mask_size (int) – 光栅化掩码的大小。

返回值

张量 – 形状为 (N, mask_size, mask_size) 的布尔张量，其中 N 是此图像的预测框数量。

property device¶

get_bounding_boxes() → detectron2.structures.Boxes ¶

返回值: Boxes – 多边形掩码周围的紧密边界框。

nonempty() → torch.Tensor ¶

查找非空的掩码。

返回值: Tensor – 一个 BoolTensor，它表示每个掩码是空的 (False) 还是非空的 (True)。

to(*args: Any, **kwargs: Any) → detectron2.structures.PolygonMasks ¶

detectron2.structures.polygons_to_bitmask(polygons: List[numpy.ndarray], height: int, width: int) → numpy.ndarray ¶

参数

polygons (list[ndarray]) – 每个数组的形状为 (Nx2,)
height (int) –
width (int) –

返回值

ndarray – 形状为 (height, width) 的布尔掩码

class detectron2.structures.ROIMasks(tensor: torch.Tensor)¶

Bases: object

通过在一些 ROI 中定义的 N 个较小的掩码来表示掩码。一旦给出 ROI 框，就可以通过将掩码“粘贴”在对应 ROI 框定义的区域上获得全图像位掩码。

__getitem__(item) → detectron2.structures.ROIMasks ¶

返回值: ROIMasks – 通过索引创建一个新的 ROIMasks。

允许以下用法

new_masks = masks[2:10]: 返回一个掩码切片。
new_masks = masks[vector]，其中 vector 是一个 torch.BoolTensor，length = len(masks)。向量中的非零元素将被选中。

请注意，返回的对象可能与该对象共享存储空间，这取决于 Pytorch 的索引语义。

__init__(tensor: torch.Tensor)¶

参数: tensor – (N, M, M) 掩码张量，定义每个 ROI 内的掩码。

property device¶

to(device: torch.device) → detectron2.structures.ROIMasks ¶

to_bitmasks(boxes: torch.Tensor, height, width, threshold=0.5)¶: 参数：参见 paste_masks_in_image() 的文档。

class detectron2.structures.RotatedBoxes(tensor: torch.Tensor)¶

基类： detectron2.structures.Boxes

此结构将旋转框列表存储为 Nx5 torch.Tensor。它支持一些关于框的常见方法 (area、clip、nonempty 等)，并且还像 Tensor 一样工作（支持索引、to(device)、.device 和所有框的迭代）

__getitem__(item) → detectron2.structures.RotatedBoxes ¶

返回值: RotatedBoxes – 通过索引创建一个新的 RotatedBoxes。

允许以下用法

new_boxes = boxes[3]: 返回一个只包含一个框的 RotatedBoxes。
new_boxes = boxes[2:10]: 返回框的切片。
new_boxes = boxes[vector]，其中 vector 是一个长度为 length = len(boxes) 的 torch.ByteTensor。向量中的非零元素将被选中。

请注意，返回的 RotatedBoxes 可能会与该 RotatedBoxes 共享存储，这取决于 Pytorch 的索引语义。

__init__(tensor: torch.Tensor)¶

参数: tensor (Tensor[float]) – Nx5 矩阵。每一行都是 (x_center, y_center, width, height, angle)，其中角度以度为单位表示。虽然没有严格的范围限制，但推荐的范围是 [-180, 180) 度。

假设我们有一个水平框 B = (x_center, y_center, width, height)，其中宽度沿 x 轴，高度沿 y 轴。旋转框 B_rot (x_center, y_center, width, height, angle) 可以看作

当 angle == 0: B_rot == B
当角度 > 0 时：通过绕其中心点 $|angle|$ 度逆时针旋转 B 来获得 B_rot；
当角度 < 0 时：通过绕其中心点 $|angle|$ 度顺时针旋转 B 来获得 B_rot。

从数学上讲，由于图像空间的右手坐标系是 (y, x)，其中 y 是从上到下，x 是从左到右，因此旋转矩形的 4 个顶点 $(yr_i, xr_i)$ (i = 1, 2, 3, 4) 可以从水平矩形的顶点 $(y_i, x_i)$ (i = 1, 2, 3, 4) 获得，方法如下 ( $\theta = angle*\pi/180$ 是弧度制中的角度， $(y_c, x_c)$ 是矩形的中心)

$\begin{align}\begin{aligned}yr_i = \cos(\theta) (y_i - y_c) - \sin(\theta) (x_i - x_c) + y_c,\\xr_i = \sin(\theta) (y_i - y_c) + \cos(\theta) (x_i - x_c) + x_c,\end{aligned}\end{align}$

这是标准的刚体旋转变换。

直观地说，角度是 (1) 从图像空间的 y 轴到框在局部坐标系中从上到下的高度向量的旋转角度（逆时针方向），以及 (2) 从图像空间的 x 轴到框在局部坐标系中从左到右的宽度向量的旋转角度（逆时针方向）。

更直观地说，考虑以下用 (x1, y1, x2, y2) 表示的水平框 ABCD：(3, 2, 7, 4)，覆盖连续坐标系的 [3, 7] x [2, 4] 区域，如下所示

O--------> x
|
|  A---B
|  |   |
|  D---C
|
v y

请注意，这里每个大写字母代表一个 0 维几何点，而不是“方形像素”。

在上面的例子中，用 (x, y) 表示一个点，我们有

$O = (0, 0), A = (3, 2), B = (7, 2), C = (7, 4), D = (3, 4)$

我们将向量 AB = 向量 DC 称为框在局部坐标系中的宽度向量，向量 AD = 向量 BC 称为框在局部坐标系中的高度向量。最初，当角度 = 0 度时，它们分别与图像空间中 x 轴和 y 轴的正方向对齐。

为了更好地说明，我们将框的中心表示为 E，

O--------> x
|
|  A---B
|  | E |
|  D---C
|
v y

其中中心 E = ((3+7)/2, (2+4)/2) = (5, 3)。

此外，

$宽度 = |AB| = |CD| = 7 - 3 = 4, 高度 = |AD| = |BC| = 4 - 2 = 2.$

因此，旋转框中相同形状的对应表示形式为 (x_center, y_center, 宽度, 高度, 角度) 格式是

(5, 3, 4, 2, 0),

现在，让我们考虑 (5, 3, 4, 2, 90)，根据定义，它是逆时针旋转 90 度。它看起来像这样

O--------> x
|   B-C
|   | |
|   |E|
|   | |
|   A-D
v y

中心 E 仍然位于相同的位置 (5, 3)，而顶点 ABCD 绕 E 逆时针旋转 90 度：A = (4, 5)，B = (4, 1)，C = (6, 1)，D = (6, 5)

这里，90 度可以看作是从 y 轴到向量 AD 或向量 BC（框在局部坐标系中从上到下的高度向量）的逆时针角度，或者是从 x 轴到向量 AB 或向量 DC（框在局部坐标系中从左到右的宽度向量）的逆时针角度。

$宽度 = |AB| = |CD| = 5 - 1 = 4, 高度 = |AD| = |BC| = 6 - 4 = 2.$

接下来，(5, 3, 4, 2, -90) 怎么样？根据定义，它顺时针旋转 90 度。它看起来像这样

O--------> x
|   D-A
|   | |
|   |E|
|   | |
|   C-B
v y

中心 E 仍然位于相同的位置 (5, 3)，而顶点 ABCD 绕 E 顺时针旋转 90 度：A = (6, 1)，B = (6, 5)，C = (4, 5)，D = (4, 1)

$宽度 = |AB| = |CD| = 5 - 1 = 4, 高度 = |AD| = |BC| = 6 - 4 = 2.$

这与 (5, 3, 4, 2, 90) 覆盖完全相同的区域，它们的 IoU 将为 1。但是，这两个将生成不同的 RoI Pooling 结果，不应视为相同的框。

另一方面，很容易看出，(X, Y, W, H, A) 与 (X, Y, W, H, A+360N) 相同，对于任何整数 N。例如，(5, 3, 4, 2, 270) 将与 (5, 3, 4, 2, -90) 相同，因为将形状逆时针旋转 270 度等效于将相同形状顺时针旋转 90 度。

我们可以进一步旋转以获得 (5, 3, 4, 2, 180) 或 (5, 3, 4, 2, -180)

O--------> x
|
|  C---D
|  | E |
|  B---A
|
v y

$\begin{align}\begin{aligned}A = (7, 4), B = (3, 4), C = (3, 2), D = (7, 2),\\宽度 = |AB| = |CD| = 7 - 3 = 4, 高度 = |AD| = |BC| = 4 - 2 = 2.\end{aligned}\end{align}$

最后，这是一个非常不准确的（高度量化的）说明，说明 (5, 3, 4, 2, 60) 在有人想知道的情况下是什么样子

O--------> x
|     B            |    /  C
|   /E /
|  A  /
|   `D
v y

它仍然是一个中心为 (5, 3)，宽度为 4，高度为 2 的矩形，但它的角度（以及方向）介于 (5, 3, 4, 2, 0) 和 (5, 3, 4, 2, 90) 之间。

__iter__()¶: 一次生成一个形状为 (5,) 的张量作为框。

area() → torch.Tensor ¶

计算所有框的面积。

返回值: torch.Tensor – 包含每个框面积的向量。

classmethod cat(boxes_list: List[RotatedBoxes]) → detectron2.structures.RotatedBoxes [source]¶

将 RotatedBoxes 的列表连接成一个单一的 RotatedBoxes

参数: boxes_list (list[RotatedBoxes]) –
返回值: RotatedBoxes – 连接后的 RotatedBoxes

clip(box_size: Tuple[int, int], clip_angle_threshold: float = 1.0) → None ¶

通过将 x 坐标限制在 [0, width] 范围内，将 y 坐标限制在 [0, height] 范围内，来裁剪（就地）框。

对于 RRPN：仅剪切几乎水平的框，公差为 clip_angle_threshold，以保持向后兼容性。

超过此阈值的旋转框不会被剪切，原因有两个

可能有多种方法可以将旋转框剪切以使其适合图像。
很难使整个矩形框适合图像，并且仍然能够不遗漏感兴趣的像素。

因此，我们依赖于 RoIAlignRotated 等操作来安全地处理这种情况。

参数

box_size (height, width) – 裁剪框的大小。
clip_angle_threshold – 如果 abs(normalized(angle)) <= clip_angle_threshold（以度为单位），我们将执行剪切作为水平框。

clone() → detectron2.structures.RotatedBoxes ¶

克隆 RotatedBoxes。

返回值: RotatedBoxes

property device¶

get_centers() → torch.Tensor ¶

返回值: 框中心，以 Nx2 数组 (x, y) 表示。

inside_box(box_size: Tuple[int, int], boundary_threshold: int = 0) → torch.Tensor ¶

参数

box_size (height, width) – 覆盖 [0, width] x [0, height] 的参考框的大小
boundary_threshold (int) – 超出参考框边界超过 boundary_threshold 的框被认为是“外部”的。

对于 RRPN，可能不需要调用此函数，因为旋转框通常会扩展到图像边界之外（剪切函数只剪切接近水平的框）

返回值: 二进制向量，指示每个框是否在参考框内。

nonempty(threshold: float = 0.0) → torch.Tensor ¶

查找非空的框。如果框的任何边都小于阈值，则该框被认为为空。

返回值: Tensor – 一个二进制向量，表示每个框是否为空（False）或非空（True）。

normalize_angles() → None ¶: 将角度限制在 [-180, 180) 度范围内

scale(scale_x: float, scale_y: float) → None ¶: 使用水平和垂直缩放因子缩放旋转框注意：当 scale_factor_x != scale_factor_y 时，当角度不是 90 度的倍数时，旋转框在调整大小变换下不会保留矩形形状。相反，形状是平行四边形（有倾斜）在这里，我们通过将旋转矩形拟合到平行四边形来进行近似。

to(device: torch.device)¶

detectron2.structures.pairwise_iou_rotated(boxes1: detectron2.structures.RotatedBoxes, boxes2: detectron2.structures.RotatedBoxes) → None ¶

给定两个大小分别为 N 和 M 的旋转框列表，计算所有 N x M 对框之间的 IoU（交并比）。框顺序必须为 (x_center, y_center, 宽度, 高度, 角度)。

参数

boxes1 (RotatedBoxes) – 两个 RotatedBoxes。分别包含 N 和 M 个旋转框。
boxes2 (RotatedBoxes) – 两个 RotatedBoxes。分别包含 N 和 M 个旋转框。

返回值

张量 – IoU，大小为 [N,M]。