Abstract

Standard techniques for improved generalization from neural networks include weight decay and pruning. Weight decay has a Bayesian interpretation with the decay function corresponding to a prior over weights. The method of transformation groups and maximum entropy suggests a Laplace rather than a gaussian prior. After training, the weights then arrange themselves into two classes: (1) those with a common sensitivity to the data error and (2) those failing to achieve this sensitivity and that therefore vanish. Since the critical value is determined adaptively during training, pruning—in the sense of setting weights to exact zeros—becomes an automatic consequence of regularization alone. The count of free parameters is also reduced automatically as weights are pruned. A comparison is made with results of MacKay using the evidence framework and a gaussian regularizer.

Keywords

MathematicsRegularization (linguistics)Bayesian probabilityGaussianPruningLaplace transformAlgorithmPrior probabilityArtificial neural networkApplied mathematicsArtificial intelligenceComputer scienceStatisticsMathematical analysis

Affiliated Institutions

Related Publications

Publication Info

Year
1995
Type
article
Volume
7
Issue
1
Pages
117-143
Citations
386
Access
Closed

External Links

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

386
OpenAlex

Cite This

Peter M. Williams (1995). Bayesian Regularization and Pruning Using a Laplace Prior. Neural Computation , 7 (1) , 117-143. https://doi.org/10.1162/neco.1995.7.1.117

Identifiers

DOI
10.1162/neco.1995.7.1.117