Convergence Results for Neural Networks via Electrodynamics

Abstract

We study whether a depth two neural network can learn another depth two network using gradient descent. Assuming a linear output node, we show that the question of whether gradient descent converges to the target function is equivalent to the following question in electrodynamics: Given k fixed protons in R^d, and k electrons, each moving due to the attractive force from the protons and repulsive force from the remaining electrons, whether at equilibrium all the electrons will be matched up with the protons, up to a permutation. Under the standard electrical force, this follows from the classic Earnshaw's theorem. In our setting, the force is determined by the activation function and the input distribution. Building on this equivalence, we prove the existence of an activation function such that gradient descent learns at least one of the hidden nodes in the target network. Iterating, we show that gradient descent can be used to learn the entire network one node at a time.

Keywords

Computer scienceExponential functionArtificial intelligenceMathematicsMathematical analysis

Affiliated Institutions

Johannes Kepler University of Linz AT

Related Publications

Highway Networks

Rupesh K. Srivastava , Klaus Greff , Jürgen Schmidhuber

There is plenty of theoretical and empirical evidence that depth of neural networks is a crucial ingredient for their success. However, network training becomes more difficult w...

2015 arXiv (Cornell University) 301 citations

LINE

Jian Tang , Meng Qu , Mingzhe Wang +3 more

This paper studies the problem of embedding very large information networks\ninto low-dimensional vector spaces, which is useful in many tasks such as\nvisualization, node class...

2015 4564 citations

The Design and Simulation of a Mobile Radio Network with Distributed Control

Dennis J Baker , Anthony Ephremides , James Flynn

A new architecture for mobile radio networks, called the linked cluster architecture, is described, and methods for implementing this architecture using distributed control tech...

1984 IEEE Journal on Selected Areas in Com... 276 citations

Training Very Deep Networks

Rupesh K. Srivastava , Klaus Greff , Jürgen Schmidhuber

Theoretical and empirical evidence indicates that the depth of neural networks is crucial for their success. However, training becomes more difficult as depth increases, and tra...

2015 arXiv (Cornell University) 1100 citations

Network In Network

Min Lin , Qiang Chen , Shuicheng Yan

Abstract: We propose a novel deep network structure called In Network (NIN) to enhance model discriminability for local patches within the receptive field. The conventional con...

2014 arXiv (Cornell University) 1037 citations

Publication Info

Year: 2018
Type: preprint
Citations: 2912
Access: Closed

External Links

View on DOI.org

Social Impact

Altmetric

Convergence Results for Neural Networks via Electrodynamics

PlumX Metrics

Social media, news, blog, policy document mentions

Citation Metrics

2912

OpenAlex

Cite This

APA Style

                            
                                    Djork-Arné Clevert, 
                                
                                    Thomas Unterthiner, 
                                
                                    Sepp Hochreiter
                                
                            (2018). 
                            Convergence Results for Neural Networks via Electrodynamics. 
                            arXiv (Cornell University)
                            
                            .
                            https://doi.org/10.4230/lipics.itcs.2018.22

Identifiers

DOI: 10.4230/lipics.itcs.2018.22