Learning and Transferring Mid-level Image Representations Using Convolutional Neural Networks

Abstract

Convolutional neural networks (CNN) have recently shown outstanding image classification performance in the large- scale visual recognition challenge (ILSVRC2012). The success of CNNs is attributed to their ability to learn rich mid-level image representations as opposed to hand-designed low-level features used in other image classification methods. Learning CNNs, however, amounts to estimating millions of parameters and requires a very large number of annotated image samples. This property currently prevents application of CNNs to problems with limited training data. In this work we show how image representations learned with CNNs on large-scale annotated datasets can be efficiently transferred to other visual recognition tasks with limited amount of training data. We design a method to reuse layers trained on the ImageNet dataset to compute mid-level image representation for images in the PASCAL VOC dataset. We show that despite differences in image statistics and tasks in the two datasets, the transferred representation leads to significantly improved results for object and action classification, outperforming the current state of the art on Pascal VOC 2007 and 2012 datasets. We also show promising results for object and action localization.

Keywords

Pascal (unit)Convolutional neural networkComputer scienceArtificial intelligencePattern recognition (psychology)Contextual image classificationReuseImage (mathematics)Representation (politics)Machine learningCognitive neuroscience of visual object recognitionTraining setFeature extraction

Affiliated Institutions

Related Publications

Is object localization for free? - Weakly-supervised learning with convolutional neural networks

Maxime Oquab , Léon Bottou , Ivan Laptev +1 more

Successful methods for visual object recognition typically rely on training datasets containing lots of richly annotated images. Detailed image annotation, e.g. by object boundi...

2015 915 citations

Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation

Ross Girshick , Jeff Donahue , Trevor Darrell +1 more

Object detection performance, as measured on the canonical PASCAL VOC dataset, has plateaued in the last few years. The best-performing methods are complex ensemble systems that...

2014 30615 citations

Momentum Contrast for Unsupervised Visual Representation Learning

Kaiming He , Haoqi Fan , Yuxin Wu +2 more

We present Momentum Contrast (MoCo) for unsupervised visual representation learning. From a perspective on contrastive learning as dictionary look-up, we build a dynamic diction...

2020 2020 IEEE/CVF Conference on Computer ... 11112 citations

ConceptLearner: Discovering visual concepts from weakly labeled image collections

Bolei Zhou , Vignesh Jagadeesh , Robinson Piramuthu

Discovering visual knowledge from weakly labeled data is crucial to scale up computer vision recognition systems, since it is expensive to obtain fully labeled data for a large ...

2015 40 citations

Understanding deep image representations by inverting them

Aravindh Mahendran , Andrea Vedaldi

Image representations, from SIFT and Bag of Visual Words to Convolutional Neural Networks (CNNs), are a crucial component of almost any image understanding system. Nevertheless,...

2015 1831 citations

Publication Info

Year: 2014
Type: article
Pages: 1717-1724
Citations: 3151
Access: Closed

External Links

View on DOI.org

Social Impact

Altmetric

Learning and Transferring Mid-level Image Representations Using Convolutional Neural Networks

PlumX Metrics

Social media, news, blog, policy document mentions

Citation Metrics

3151

OpenAlex

Cite This

APA Style

                            
                                    Maxime Oquab, 
                                
                                    Léon Bottou, 
                                
                                    Ivan Laptev
                                
                                et al.
                            
                            (2014). 
                            Learning and Transferring Mid-level Image Representations Using Convolutional Neural Networks. 
                            
                            , 1717-1724.
                            https://doi.org/10.1109/cvpr.2014.222

Identifiers

DOI: 10.1109/cvpr.2014.222