Building high-level features using large scale unsupervised learning

Abstract

We consider the problem of building high-level, class-specific feature detectors from only unlabeled data. For example, is it possible to learn a face detector using only unlabeled images using unlabeled images? To answer this, we train a 9-layered locally connected sparse autoencoder with pooling and local contrast normalization on a large dataset of images (the model has 1 billion connections, the dataset has 10 million 200×200 pixel images downloaded from the Internet). We train this network using model parallelism and asynchronous SGD on a cluster with 1,000 machines (16,000 cores) for three days. Contrary to what appears to be a widely-held intuition, our experimental results reveal that it is possible to train a face detector without having to label images as containing a face or not. Control experiments show that this feature detector is robust not only to translation but also to scaling and out-of-plane rotation. We also find that the same network is sensitive to other high-level concepts such as cat faces and human bodies. Starting with these learned features, we trained our network to obtain 15.8% accuracy in recognizing 20,000 object categories from ImageNet, a leap of 70% relative improvement over the previous state-of-the-art.

Keywords

Artificial intelligenceComputer scienceAutoencoderPattern recognition (psychology)Normalization (sociology)DetectorPoolingUnsupervised learningFeature extractionPixelFeature learningDeep learningComputer visionMachine learning

Affiliated Institutions

Related Publications

Unsupervised Feature Learning via Non-parametric Instance Discrimination

Zhirong Wu , Yuanjun Xiong , Stella X. Yu +1 more

Neural net classifiers trained on data with annotated class labels can also capture apparent visual similarity among categories without being directed to do so. We study whether...

2018 3435 citations

SUN attribute database: Discovering, annotating, and recognizing scene attributes

Geneviève Patterson , James Hays

In this paper we present the first large-scale scene attribute database. First, we perform crowd-sourced human studies to find a taxonomy of 102 discriminative attributes. Next,...

2012 857 citations

SwinIR: Image Restoration Using Swin Transformer

Jingyun Liang , Jiezhang Cao , Guolei Sun +3 more

Image restoration is a long-standing low-level vision problem that aims to restore high-quality images from low-quality images (e.g., downscaled, noisy and compressed images). W...

2021 3538 citations

Deformable Part Descriptors for Fine-Grained Recognition and Attribute Prediction

Ning Zhang , Ryan Farrell , Forrest Iandola +1 more

Recognizing objects in fine-grained domains can be extremely challenging due to the subtle differences between subcategories. Discriminative markings are often highly localized,...

2013 201 citations

Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

Sergey Ioffe , Christian Szegedy

Training Deep Neural Networks is complicated by the fact that the distribution of each layer's inputs changes during training, as the parameters of the previous layers change. T...

2024 arXiv (Cornell University) 15635 citations

Publication Info

Year: 2012
Type: article
Pages: 507-514
Citations: 667
Access: Closed

External Links

Citation Metrics

667

OpenAlex

Cite This

APA Style

                            
                                    Marc’Aurelio Ranzato, 
                                
                                    Rajat Monga, 
                                
                                    Matthieu Devin
                                
                                et al.
                            
                            (2012). 
                            Building high-level features using large scale unsupervised learning. 
                            International Conference on Machine Learning
                            
                            , 507-514.