Abstract
We study whether a depth two neural network can learn another depth two network using gradient descent. Assuming a linear output node, we show that the question of whether gradient descent converges to the target function is equivalent to the following question in electrodynamics: Given k fixed protons in R^d, and k electrons, each moving due to the attractive force from the protons and repulsive force from the remaining electrons, whether at equilibrium all the electrons will be matched up with the protons, up to a permutation. Under the standard electrical force, this follows from the classic Earnshaw's theorem. In our setting, the force is determined by the activation function and the input distribution. Building on this equivalence, we prove the existence of an activation function such that gradient descent learns at least one of the hidden nodes in the target network. Iterating, we show that gradient descent can be used to learn the entire network one node at a time.
Keywords
Affiliated Institutions
Related Publications
Highway Networks
There is plenty of theoretical and empirical evidence that depth of neural networks is a crucial ingredient for their success. However, network training becomes more difficult w...
LINE
This paper studies the problem of embedding very large information networks\ninto low-dimensional vector spaces, which is useful in many tasks such as\nvisualization, node class...
The Design and Simulation of a Mobile Radio Network with Distributed Control
A new architecture for mobile radio networks, called the linked cluster architecture, is described, and methods for implementing this architecture using distributed control tech...
Training Very Deep Networks
Theoretical and empirical evidence indicates that the depth of neural networks is crucial for their success. However, training becomes more difficult as depth increases, and tra...
Network In Network
Abstract: We propose a novel deep network structure called In Network (NIN) to enhance model discriminability for local patches within the receptive field. The conventional con...
Publication Info
- Year
- 2018
- Type
- preprint
- Citations
- 2912
- Access
- Closed
External Links
Social Impact
Social media, news, blog, policy document mentions
Citation Metrics
Cite This
Identifiers
- DOI
- 10.4230/lipics.itcs.2018.22