MultiScaleRoIAlign — Torchvision 0.22 documentation (original) (raw)

class torchvision.ops.MultiScaleRoIAlign(featmap_names: List[str], output_size: Union[int, Tuple[int], List[int]], sampling_ratio: int, *, canonical_scale: int = 224, canonical_level: int = 4)[source]

Multi-scale RoIAlign pooling, which is useful for detection with or without FPN.

It infers the scale of the pooling via the heuristics specified in eq. 1 of the Feature Pyramid Network paper. They keyword-only parameters canonical_scale and canonical_levelcorrespond respectively to 224 and k0=4 in eq. 1, and have the following meaning: canonical_level is the target level of the pyramid from which to pool a region of interest with w x h = canonical_scale x canonical_scale.

Parameters:

Examples:

m = torchvision.ops.MultiScaleRoIAlign(['feat1', 'feat3'], 3, 2) i = OrderedDict() i['feat1'] = torch.rand(1, 5, 64, 64) i['feat2'] = torch.rand(1, 5, 32, 32) # this feature won't be used in the pooling i['feat3'] = torch.rand(1, 5, 16, 16)

create some random bounding boxes

boxes = torch.rand(6, 4) * 256; boxes[:, 2:] += boxes[:, :2]

original image size, before computing the feature maps

image_sizes = [(512, 512)] output = m(i, [boxes], image_sizes) print(output.shape) torch.Size([6, 5, 3, 3])

forward(x: Dict[str, Tensor], boxes: List[Tensor], image_shapes: List[Tuple[int, int]]) → Tensor[source]

Parameters:

Returns:

result (Tensor)