Cascade R-CNN: Delving Into High Quality Object Detection

Abstract

In object detection, an intersection over union (IoU) threshold is required to define positives and negatives. An object detector, trained with low IoU threshold, e.g. 0.5, usually produces noisy detections. However, detection performance tends to degrade with increasing the IoU thresholds. Two main factors are responsible for this: 1) overfitting during training, due to exponentially vanishing positive samples, and 2) inference-time mismatch between the IoUs for which the detector is optimal and those of the input hypotheses. A multi-stage object detection architecture, the Cascade R-CNN, is proposed to address these problems. It consists of a sequence of detectors trained with increasing IoU thresholds, to be sequentially more selective against close false positives. The detectors are trained stage by stage, leveraging the observation that the output of a detector is a good distribution for training the next higher quality detector. The resampling of progressively improved hypotheses guarantees that all detectors have a positive set of examples of equivalent size, reducing the overfitting problem. The same cascade procedure is applied at inference, enabling a closer match between the hypotheses and the detector quality of each stage. A simple implementation of the Cascade R-CNN is shown to surpass all single-model object detectors on the challenging COCO dataset. Experiments also show that the Cascade R-CNN is widely applicable across detector architectures, achieving consistent gains independently of the baseline detector strength. The code is available at https://github.com/zhaoweicai/cascade-rcnn.

Keywords

OverfittingDetectorComputer scienceCascadeArtificial intelligenceFalse positive paradoxObject detectionPattern recognition (psychology)InferenceConvolutional neural networkAlgorithmComputer visionArtificial neural networkTelecommunicationsEngineering

Affiliated Institutions

UC San Diego Health System US

Related Publications

FCOS: Fully Convolutional One-Stage Object Detection

Zhi Tian , Chunhua Shen , Hao Chen +1 more

We propose a fully convolutional one-stage object detector (FCOS) to solve object detection in a per-pixel prediction fashion, analogue to semantic segmentation. Almost all stat...

2019 2019 IEEE/CVF International Conferenc... 5672 citations

Focal Loss for Dense Object Detection

Tsung-Yi Lin , Priya Goyal , Ross Girshick +2 more

The highest accuracy object detectors to date are based on a two-stage approach popularized by R-CNN, where a classifier is applied to a sparse set of candidate object locations...

2018 IEEE Transactions on Pattern Analysis... 9004 citations

HOGgles: Visualizing Object Detection Features

Carl Vondrick , Aditya Khosla , Tomasz Malisiewicz +1 more

We introduce algorithms to visualize feature spaces used by object detectors. The tools in this paper allow a human to put on 'HOG goggles' and perceive the visual world as a HO...

2013 284 citations

Distance-IoU Loss: Faster and Better Learning for Bounding Box Regression

Zhaohui Zheng , Ping Wang , Wei Liu +3 more

Bounding box regression is the crucial step in object detection. In existing methods, while ℓn-norm loss is widely adopted for bounding box regression, it is not tailored to the...

2020 Proceedings of the AAAI Conference on... 3685 citations

Improving neural networks by preventing co-adaptation of feature detectors

Geoffrey E. Hinton , Nitish Srivastava , Alex Krizhevsky +2 more

When a large feedforward neural network is trained on a small training set, it typically performs poorly on held-out test data. This "overfitting" is greatly reduced by randomly...

2012 arXiv (Cornell University) 6630 citations

Publication Info

Year: 2018
Type: preprint
Pages: 6154-6162
Citations: 6294
Access: Closed

External Links

Download PDF (Free) View on DOI.org arXiv Semantic Scholar

Social Impact

Altmetric

Cascade R-CNN: Delving Into High Quality Object Detection

PlumX Metrics

Social media, news, blog, policy document mentions

Citation Metrics

6294

OpenAlex

916

Influential

5145

CrossRef

Cite This

APA Style

                            
                                    Zhaowei Cai, 
                                
                                    Nuno Vasconcelos
                                
                            (2018). 
                            Cascade R-CNN: Delving Into High Quality Object Detection. 
                            2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition
                            
                            , 6154-6162.
                            https://doi.org/10.1109/cvpr.2018.00644

Identifiers

DOI: 10.1109/cvpr.2018.00644
arXiv: 1712.00726

Data Quality

Data completeness: 88%