Mask RCNN (original) (raw)

Mask R-CNN

Last Updated : 20 May, 2026

Mask R-CNN is an advanced computer vision model used for object detection and instance segmentation. It extends Faster R-CNN by adding a mask prediction branch that generates pixel-level segmentation masks for detected objects.

Instance Segmentation

Instance segmentation identifies and separates each individual object present in an image by assigning unique pixel-level masks to every object instance.

Instance Segmentation

Working of Mask R-CNN

Mask R-CNN extends the two-stage Faster R-CNN architecture by adding a separate mask prediction branch for instance segmentation. It detects objects, classifies them and generates pixel-level segmentation masks for each object instance.

Mask R-CNN Architecture

Mask R-CNN was proposed by Kaiming He et al. in 2017 as an extension of Faster R-CNN for instance segmentation. Along with object detection and bounding box prediction, it also generates a binary segmentation mask for each detected object.

Mask R-CNN Architecture

Mask R-CNN Architecture

Main components include:

1. Backbone Network

The backbone network extracts feature maps from the input image using architectures like ResNet-C4 and ResNet-FPN.

Mask R-CNN backbone architecture

Mask R-CNN backbone architecture

2. Region Proposal Network

The RPN generates candidate object regions from convolutional feature maps.

Anchor Generation Mask R-CNN

Anchor Generation Mask R-CNN

3. Mask Representation

The mask branch predicts segmentation masks for each Region of Interest (RoI).

4. RoI Align

RoI align has the same motive as of RoI pool, to generate the fixed size regions of interest from region proposals. It works in the following steps:

ROI Align

ROI Align

Given the feature map of the previous Convolution layer of size _h*w, divide this feature map into _M * N grids of equal size (we will NOT just take integer value).

The mask R-CNN inference speed is around _2 fps, which is good considering the addition of a segmentation branch in the architecture.

Applications

Advantages

Limitations