Abstract
Standard techniques for improved generalization from neural networks include weight decay and pruning. Weight decay has a Bayesian interpretation with the decay function corresponding to a prior over weights. The method of transformation groups and maximum entropy suggests a Laplace rather than a gaussian prior. After training, the weights then arrange themselves into two classes: (1) those with a common sensitivity to the data error and (2) those failing to achieve this sensitivity and that therefore vanish. Since the critical value is determined adaptively during training, pruning—in the sense of setting weights to exact zeros—becomes an automatic consequence of regularization alone. The count of free parameters is also reduced automatically as weights are pruned. A comparison is made with results of MacKay using the evidence framework and a gaussian regularizer.
Keywords
Affiliated Institutions
Related Publications
The sample complexity of pattern classification with neural networks: the size of the weights is more important than the size of the network
Sample complexity results from computational learning theory, when applied to neural network learning for pattern classification problems, suggest that for good generalization p...
A Practical Bayesian Framework for Backpropagation Networks
A quantitative and practical Bayesian framework is described for learning of mappings in feedforward networks. The framework makes possible (1) objective comparisons between sol...
Fixing Weight Decay Regularization in Adam
We note that common implementations of adaptive gradient algorithms, such as Adam, limit the potential benefit of weight decay regularization, because the weights do not decay m...
Bootstrapping with Noise: An Effective Regularization Technique
Bootstrap samples with noise are shown to be an effective smoothness and capacity control technique for training feedforward networks and for other statistical methods such as g...
Computation with Infinite Neural Networks
For neural networks with a wide class of weight priors, it can be shown that in the limit of an infinite number of hidden units, the prior over functions tends to a gaussian pro...
Publication Info
- Year
- 1995
- Type
- article
- Volume
- 7
- Issue
- 1
- Pages
- 117-143
- Citations
- 386
- Access
- Closed
External Links
Social Impact
Social media, news, blog, policy document mentions
Citation Metrics
Cite This
Identifiers
- DOI
- 10.1162/neco.1995.7.1.117