Abstract

In this work, we revisit the global average pooling layer proposed in [13], and shed light on how it explicitly enables the convolutional neural network (CNN) to have remarkable localization ability despite being trained on imagelevel labels. While this technique was previously proposed as a means for regularizing training, we find that it actually builds a generic localizable deep representation that exposes the implicit attention of CNNs on an image. Despite the apparent simplicity of global average pooling, we are able to achieve 37.1% top-5 error for object localization on ILSVRC 2014 without training on any bounding box annotation. We demonstrate in a variety of experiments that our network is able to localize the discriminative image regions despite just being trained for solving classification task1.

Keywords

Discriminative modelPoolingArtificial intelligenceComputer scienceConvolutional neural networkPattern recognition (psychology)Bounding overwatchMinimum bounding boxDeep learningContextual image classificationRepresentation (politics)Image (mathematics)SimplicityObject (grammar)Layer (electronics)Annotation

Related Publications

Network In Network

Abstract: We propose a novel deep network structure called In Network (NIN) to enhance model discriminability for local patches within the receptive field. The conventional con...

2014 arXiv (Cornell University) 1037 citations

Publication Info

Year
2016
Type
article
Citations
10334
Access
Closed

External Links

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

10334
OpenAlex

Cite This

Bolei Zhou, Aditya Khosla, Àgata Lapedriza et al. (2016). Learning Deep Features for Discriminative Localization. . https://doi.org/10.1109/cvpr.2016.319

Identifiers

DOI
10.1109/cvpr.2016.319