Scaling learning algorithms towards AI

Yoshua Bengio; Yann LeCun

Abstract

One long-term goal of machine learning research is to produce methods that are applicable to highly complex tasks, such as perception (vision, audition), rea-soning, intelligent control, and other artificially intelligent behaviors. We argue that in order to progress toward this goal, the Machine Learning community must endeavor to discover algorithms that can learn highly complex functions, with min-imal need for prior knowledge, and with minimal human intervention. We present mathematical and empirical evidence suggesting that many popular approaches to non-parametric learning, particularly kernel methods, are fundamentally lim-ited in their ability to learn complex high-dimensional functions. Our analysis focuses on two problems. First, kernel machines are shallow architectures, in which one large layer of simple template matchers is followed by a single layer of trainable coefficients. We argue that shallow architectures can be very ineffi-cient in terms of required number of computational elements and examples. Sec-ond, we analyze a limitation of kernel machines with a local kernel, linked to the curse of dimensionality, that applies to supervised, unsupervised (manifold learn-ing) and semi-supervised kernel machines. Using empirical results on invariant image recognition tasks, kernel methods are compared with deep architectures, in which lower-level features or concepts are progressively combined into more ab-stract and higher-level representations. We argue that deep architectures have the potential to generalize in non-local ways, i.e., beyond immediate neighbors, and that this is crucial in order to make progress on the kind of complex tasks required for artificial intelligence. 1 1

Keywords

Artificial intelligenceComputer scienceMachine learningKernel (algebra)Curse of dimensionalityKernel methodAlgorithmSupport vector machineMathematics

Affiliated Institutions

Related Publications

The Unreasonable Effectiveness of Deep Features as a Perceptual Metric

Richard Zhang , Phillip Isola , Alexei A. Efros +2 more

While it is nearly effortless for humans to quickly assess the perceptual similarity between two images, the underlying processes are thought to be quite complex. Despite this, ...

2018 10763 citations

A neural probabilistic language model

BengioYoshua , DucharmeRéjean , VincentPascal +1 more

A goal of statistical language modeling is to learn the joint probability function of sequences of words in a language. This is intrinsically difficult because of the curse of d...

2003 Journal of Machine Learning Research 2660 citations

A Survey on Multi-Task Learning

Yu Zhang , Qiang Yang

Multi-Task Learning (MTL) is a learning paradigm in machine learning and its aim is to leverage useful information contained in multiple related tasks to help improve the genera...

2021 IEEE Transactions on Knowledge and Da... 1864 citations

Object Detection With Deep Learning: A Review

Zhong‐Qiu Zhao , Peng Zheng , Shou-Tao Xu +1 more

Due to object detection's close relationship with video analysis and image understanding, it has attracted much research attention in recent years. Traditional object detection ...

2019 IEEE Transactions on Neural Networks ... 5019 citations

Deep Convolutional Transfer Learning Network: A New Method for Intelligent Fault Diagnosis of Machines With Unlabeled Data

Liang Guo , Yaguo Lei , Saibo Xing +2 more

The success of intelligent fault diagnosis of machines relies on the following two conditions: 1) labeled data with fault information are available; and 2) the training and test...

2018 IEEE Transactions on Industrial Elect... 1138 citations

Publication Info

Year: 2007
Type: article
Citations: 927
Access: Closed

External Links

Citation Metrics

927

OpenAlex

Cite This

APA Style

                            
                                    Yoshua Bengio, 
                                
                                    Yann LeCun
                                
                            (2007). 
                            Scaling learning algorithms towards AI. 
                            
                            .