Bayesian Regularization and Pruning Using a Laplace Prior

Peter M. Williams

doi:10.1162/neco.1995.7.1.117

Abstract

Standard techniques for improved generalization from neural networks include weight decay and pruning. Weight decay has a Bayesian interpretation with the decay function corresponding to a prior over weights. The method of transformation groups and maximum entropy suggests a Laplace rather than a gaussian prior. After training, the weights then arrange themselves into two classes: (1) those with a common sensitivity to the data error and (2) those failing to achieve this sensitivity and that therefore vanish. Since the critical value is determined adaptively during training, pruning—in the sense of setting weights to exact zeros—becomes an automatic consequence of regularization alone. The count of free parameters is also reduced automatically as weights are pruned. A comparison is made with results of MacKay using the evidence framework and a gaussian regularizer.

Keywords

MathematicsRegularization (linguistics)Bayesian probabilityGaussianPruningLaplace transformAlgorithmPrior probabilityArtificial neural networkApplied mathematicsArtificial intelligenceComputer scienceStatisticsMathematical analysis

Affiliated Institutions

University of Sussex GB

Related Publications

The sample complexity of pattern classification with neural networks: the size of the weights is more important than the size of the network

Peter L. Bartlett

Sample complexity results from computational learning theory, when applied to neural network learning for pattern classification problems, suggest that for good generalization p...

1998 IEEE Transactions on Information Theory 1185 citations

A Practical Bayesian Framework for Backpropagation Networks

David Mackay

A quantitative and practical Bayesian framework is described for learning of mappings in feedforward networks. The framework makes possible (1) objective comparisons between sol...

1992 Neural Computation 2841 citations

Fixing Weight Decay Regularization in Adam

Ilya Loshchilov , Frank Hutter

We note that common implementations of adaptive gradient algorithms, such as Adam, limit the potential benefit of weight decay regularization, because the weights do not decay m...

2018 1137 citations

Bootstrapping with Noise: An Effective Regularization Technique

Yuval Raviv , Nathan Intrator

Bootstrap samples with noise are shown to be an effective smoothness and capacity control technique for training feedforward networks and for other statistical methods such as g...

1996 Connection Science 190 citations

Computation with Infinite Neural Networks

Christopher K. I. Williams

For neural networks with a wide class of weight priors, it can be shown that in the limit of an infinite number of hidden units, the prior over functions tends to a gaussian pro...

1998 Neural Computation 150 citations

Publication Info

Year: 1995
Type: article
Volume: 7
Issue: 1
Pages: 117-143
Citations: 386
Access: Closed

External Links

View on DOI.org

Social Impact

Altmetric

Bayesian Regularization and Pruning Using a Laplace Prior

PlumX Metrics

Social media, news, blog, policy document mentions

Citation Metrics

386

OpenAlex

Cite This

APA Style

                            
                                    Peter M. Williams
                                
                            (1995). 
                            Bayesian Regularization and Pruning Using a Laplace Prior. 
                            Neural Computation
                            , 7
                            (1)
                            , 117-143.
                            https://doi.org/10.1162/neco.1995.7.1.117

Identifiers

DOI: 10.1162/neco.1995.7.1.117