Training with Noise is Equivalent to Tikhonov Regularization

Chris Bishop

doi:10.1162/neco.1995.7.1.108

Abstract

It is well known that the addition of noise to the input data of a neural network during training can, in some circumstances, lead to significant improvements in generalization performance. Previous work has shown that such training with noise is equivalent to a form of regularization in which an extra term is added to the error function. However, the regularization term, which involves second derivatives of the error function, is not bounded below, and so can lead to difficulties if used directly in a learning algorithm based on error minimization. In this paper we show that for the purposes of network training, the regularization term can be reduced to a positive semi-definite form that involves only first derivatives of the network mapping. For a sum-of-squares error function, the regularization term belongs to the class of generalized Tikhonov regularizers. Direct minimization of the regularized error function provides a practical alternative to training with noise.

Keywords

Tikhonov regularizationRegularization perspectives on support vector machinesRegularization (linguistics)Artificial neural networkMinificationBackus–Gilbert methodEarly stoppingMathematicsError functionBounded functionComputer scienceNoise (video)Proximal gradient methods for learningApplied mathematicsAlgorithmMathematical optimizationArtificial intelligenceInverse problemMathematical analysis

Affiliated Institutions

Aston University GB

Related Publications

An analysis of noise in recurrent neural networks: convergence and generalization

Kam-Chuen Jim , C. Lee Giles , B.G. Horne

Concerns the effect of noise on the performance of feedforward neural nets. We introduce and analyze various methods of injecting synaptic noise into dynamically driven recurren...

1996 IEEE Transactions on Neural Networks 138 citations

Robust Solutions to Least-Squares Problems with Uncertain Data

Laurent El Ghaoui , Hervé Lebret

We consider least-squares problems where the coefficient matrices A, b are unknown but bounded. We minimize the worst-case residual error using (convex) second-order cone progra...

1997 SIAM Journal on Matrix Analysis and A... 1049 citations

Regularization Theory and Neural Networks Architectures

Federico Girosi , Michael Jones , Tomaso Poggio

We had previously shown that regularization principles lead to approximation schemes that are equivalent to networks with one layer of hidden units, called regularization networ...

1995 Neural Computation 1344 citations

Understanding deep learning (still) requires rethinking generalization

Chiyuan Zhang , Samy Bengio , Moritz Hardt +2 more

Despite their massive size, successful deep artificial neural networks can exhibit a remarkably small gap between training and test performance. Conventional wisdom attributes s...

2021 Communications of the ACM 2043 citations

The sample complexity of pattern classification with neural networks: the size of the weights is more important than the size of the network

Peter L. Bartlett

Sample complexity results from computational learning theory, when applied to neural network learning for pattern classification problems, suggest that for good generalization p...

1998 IEEE Transactions on Information Theory 1185 citations

Publication Info

Year: 1995
Type: article
Volume: 7
Issue: 1
Pages: 108-116
Citations: 1239
Access: Closed

External Links

View on DOI.org

Social Impact

Altmetric

Training with Noise is Equivalent to Tikhonov Regularization

PlumX Metrics

Social media, news, blog, policy document mentions

Citation Metrics

1239

OpenAlex

Cite This

APA Style

                            
                                    Chris Bishop
                                
                            (1995). 
                            Training with Noise is Equivalent to Tikhonov Regularization. 
                            Neural Computation
                            , 7
                            (1)
                            , 108-116.
                            https://doi.org/10.1162/neco.1995.7.1.108

Identifiers

DOI: 10.1162/neco.1995.7.1.108