Abstract

We study whether a depth two neural network can learn another depth two network using gradient descent. Assuming a linear output node, we show that the question of whether gradient descent converges to the target function is equivalent to the following question in electrodynamics: Given k fixed protons in R^d, and k electrons, each moving due to the attractive force from the protons and repulsive force from the remaining electrons, whether at equilibrium all the electrons will be matched up with the protons, up to a permutation. Under the standard electrical force, this follows from the classic Earnshaw's theorem. In our setting, the force is determined by the activation function and the input distribution. Building on this equivalence, we prove the existence of an activation function such that gradient descent learns at least one of the hidden nodes in the target network. Iterating, we show that gradient descent can be used to learn the entire network one node at a time.

Keywords

Computer scienceExponential functionArtificial intelligenceMathematicsMathematical analysis

Affiliated Institutions

Related Publications

LINE

This paper studies the problem of embedding very large information networks\ninto low-dimensional vector spaces, which is useful in many tasks such as\nvisualization, node class...

2015 4564 citations

Network In Network

Abstract: We propose a novel deep network structure called In Network (NIN) to enhance model discriminability for local patches within the receptive field. The conventional con...

2014 arXiv (Cornell University) 1037 citations

Publication Info

Year
2018
Type
preprint
Citations
2912
Access
Closed

External Links

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

2912
OpenAlex

Cite This

Djork-Arné Clevert, Thomas Unterthiner, Sepp Hochreiter (2018). Convergence Results for Neural Networks via Electrodynamics. arXiv (Cornell University) . https://doi.org/10.4230/lipics.itcs.2018.22

Identifiers

DOI
10.4230/lipics.itcs.2018.22