Learning Factorial Codes by Predictability Minimization

Jürgen Schmidhuber

doi:10.1162/neco.1992.4.6.863

Abstract

I propose a novel general principle for unsupervised learning of distributed nonredundant internal representations of input patterns. The principle is based on two opposing forces. For each representational unit there is an adaptive predictor, which tries to predict the unit from the remaining units. In turn, each unit tries to react to the environment such that it minimizes its predictability. This encourages each unit to filter "abstract concepts" out of the environmental input such that these concepts are statistically independent of those on which the other units focus. I discuss various simple yet potentially powerful implementations of the principle that aim at finding binary factorial codes (Barlow et al. 1989), i.e., codes where the probability of the occurrence of a particular input is simply the product of the probabilities of the corresponding code symbols. Such codes are potentially relevant for (1) segmentation tasks, (2) speeding up supervised learning, and (3) novelty detection. Methods for finding factorial codes automatically implement Occam's razor for finding codes using a minimal number of units. Unlike previous methods the novel principle has a potential for removing not only linear but also nonlinear output redundancy. Illustrative experiments show that algorithms based on the principle of predictability minimization are practically feasible. The final part of this paper describes an entirely local algorithm that has a potential for learning unique representations of extended input sequences.

Keywords

PredictabilityFactorialComputer scienceAlgorithmMinificationArtificial intelligenceMathematicsMachine learningTheoretical computer scienceMathematical optimization

Affiliated Institutions

University of Colorado Boulder US

Related Publications

William of Occam and Occam's Razor

Vincent Lo Re , Lisa M. Bellini

Letters16 April 2002William of Occam and Occam's RazorVincent Lo Re III, MD and Lisa M. Bellini, MDVincent Lo Re III, MDHospital of the University of Pennsylvania; Philadelphia,...

2002 Annals of Internal Medicine 31 citations

Nonlinear neurons in the low-noise limit: a factorial code maximizes information transfer

Jean‐Pierre Nadal , Néstor Parga

We investigate the consequences of maximizing information transfer in a simple neural network (one input layer, one output layer), focusing on the case of nonlinear transfer fun...

1994 Network Computation in Neural Systems 127 citations

An Introduction to Computational Learning Theory

Michael Kearns , Umesh Vazirani

Emphasizing issues of computational efficiency, Michael Kearns and Umesh Vazirani introduce a number of central topics in computational learning theory for researchers and stude...

1994 The MIT Press eBooks 1717 citations

Sequence Transduction with Recurrent Neural Networks

Alex Graves

Many machine learning tasks can be expressed as the transformation---or \emph{transduction}---of input sequences into output sequences: speech recognition, machine translation, ...

2012 arXiv (Cornell University) 1292 citations

Extracting and composing robust features with denoising autoencoders

Pascal Vincent , Hugo Larochelle , Yoshua Bengio +1 more

Previous work has shown that the difficulties in learning deep generative or discriminative models can be overcome by an initial unsupervised learning step that maps inputs to u...

2008 7123 citations

Publication Info

Year: 1992
Type: article
Volume: 4
Issue: 6
Pages: 863-879
Citations: 278
Access: Closed

External Links

View on DOI.org

Social Impact

Altmetric

Learning Factorial Codes by Predictability Minimization

PlumX Metrics

Social media, news, blog, policy document mentions

Citation Metrics

278

OpenAlex

Cite This

APA Style

                            
                                    Jürgen Schmidhuber
                                
                            (1992). 
                            Learning Factorial Codes by Predictability Minimization. 
                            Neural Computation
                            , 4
                            (6)
                            , 863-879.
                            https://doi.org/10.1162/neco.1992.4.6.863

Identifiers

DOI: 10.1162/neco.1992.4.6.863