Mask R-CNN | RDL Research Database

Abstract

We present a conceptually simple, flexible, and general framework for object instance segmentation. Our approach efficiently detects objects in an image while simultaneously generating a high-quality segmentation mask for each instance. The method, called Mask R-CNN, extends Faster R-CNN by adding a branch for predicting an object mask in parallel with the existing branch for bounding box recognition. Mask R-CNN is simple to train and adds only a small overhead to Faster R-CNN, running at 5 fps. Moreover, Mask R-CNN is easy to generalize to other tasks, e.g., allowing us to estimate human poses in the same framework. We show top results in all three tracks of the COCO suite of challenges, including instance segmentation, bounding-box object detection, and person keypoint detection. Without tricks, Mask R-CNN outperforms all existing, single-model entries on every task, including the COCO 2016 challenge winners. We hope our simple and effective approach will serve as a solid baseline and help ease future research in instance-level recognition. Code will be made available.

Keywords

Computer scienceMinimum bounding boxSegmentationArtificial intelligenceObject detectionBounding overwatchOverhead (engineering)Object (grammar)Task (project management)Code (set theory)Simple (philosophy)SuitePattern recognition (psychology)Computer visionImage (mathematics)Image segmentationSet (abstract data type)

Affiliated Institutions

Meta (Israel) IL

Related Publications

Path Aggregation Network for Instance Segmentation

Shu Liu , Lu Qi , Haifang Qin +2 more

The way that information propagates in neural networks is of great importance. In this paper, we propose Path Aggregation Network (PANet) aiming at boosting information flow in ...

2018 2018 IEEE/CVF Conference on Computer ... 7956 citations

FCOS: Fully Convolutional One-Stage Object Detection

Zhi Tian , Chunhua Shen , Hao Chen +1 more

We propose a fully convolutional one-stage object detector (FCOS) to solve object detection in a per-pixel prediction fashion, analogue to semantic segmentation. Almost all stat...

2019 2019 IEEE/CVF International Conferenc... 5672 citations

Distance-IoU Loss: Faster and Better Learning for Bounding Box Regression

Zhaohui Zheng , Ping Wang , Wei Liu +3 more

Bounding box regression is the crucial step in object detection. In existing methods, while ℓn-norm loss is widely adopted for bounding box regression, it is not tailored to the...

2020 Proceedings of the AAAI Conference on... 3685 citations

Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation

Ross Girshick , Jeff Donahue , Trevor Darrell +1 more

Object detection performance, as measured on the canonical PASCAL VOC dataset, has plateaued in the last few years. The best-performing methods are complex ensemble systems that...

2014 30615 citations

UNet++: Redesigning Skip Connections to Exploit Multiscale Features in Image Segmentation

Zongwei Zhou , Md Mahfuzur Rahman Siddiquee , Nima Tajbakhsh +1 more

The state-of-the-art models for medical image segmentation are variants of U-Net and fully convolutional networks (FCN). Despite their success, these models have two limitations...

2019 IEEE Transactions on Medical Imaging 3567 citations

Publication Info

Year: 2017
Type: preprint
Pages: 2980-2988
Citations: 27097
Access: Closed

External Links

View on DOI.org

Social Impact

Altmetric

Mask R-CNN

PlumX Metrics

Social media, news, blog, policy document mentions

Citation Metrics

27097

OpenAlex

24289

CrossRef

Cite This

APA Style

                            
                                    Kaiming He, 
                                
                                    Georgia Gkioxari, 
                                
                                    Piotr Dollár
                                
                                et al.
                            
                            (2017). 
                            Mask R-CNN. 
                            2017 IEEE International Conference on Computer Vision (ICCV)
                            
                            , 2980-2988.
                            https://doi.org/10.1109/iccv.2017.322

Identifiers

DOI: 10.1109/iccv.2017.322

Data Quality

Data completeness: 77%