Abstract

Successful methods for visual object recognition typically rely on training datasets containing lots of richly annotated images. Detailed image annotation, e.g. by object bounding boxes, however, is both expensive and often subjective. We describe a weakly supervised convolutional neural network (CNN) for object classification that relies only on image-level labels, yet can learn from cluttered scenes containing multiple objects. We quantify its object classification and object location prediction performance on the Pascal VOC 2012 (20 object classes) and the much larger Microsoft COCO (80 object classes) datasets. We find that the network (i) outputs accurate image-level labels, (ii) predicts approximate locations (but not extents) of objects, and (iii) performs comparably to its fully-supervised counterparts using object bounding box annotation for training.

Keywords

Pascal (unit)Artificial intelligenceConvolutional neural networkComputer scienceBounding overwatchMinimum bounding boxObject (grammar)AnnotationPattern recognition (psychology)Cognitive neuroscience of visual object recognitionObject detectionSupervised learningContextual image classificationArtificial neural networkComputer visionImage (mathematics)Machine learning

Affiliated Institutions

Related Publications

Publication Info

Year
2015
Type
preprint
Citations
915
Access
Closed

External Links

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

915
OpenAlex

Cite This

Maxime Oquab, Léon Bottou, Ivan Laptev et al. (2015). Is object localization for free? - Weakly-supervised learning with convolutional neural networks. . https://doi.org/10.1109/cvpr.2015.7298668

Identifiers

DOI
10.1109/cvpr.2015.7298668