Abstract

There is plenty of theoretical and empirical evidence that depth of neural networks is a crucial ingredient for their success. However, network training becomes more difficult with increasing depth and training of very deep networks remains an open problem. In this extended abstract, we introduce a new architecture designed to ease gradient-based training of very deep networks. We refer to networks with this architecture as highway networks, since they allow unimpeded information flow across several layers on information highways. The architecture is characterized by the use of gating units which learn to regulate the flow of information through a network. Highway networks with hundreds of layers can be trained directly using stochastic gradient descent and with a variety of activation functions, opening up the possibility of studying extremely deep and efficient architectures.

Keywords

Computer scienceArchitectureStochastic gradient descentInformation flowNetwork architectureVariety (cybernetics)Artificial intelligenceArtificial neural networkGradient descentDeep neural networksDistributed computingComputer architectureComputer networkGeography

Related Publications

LINE

This paper studies the problem of embedding very large information networks\ninto low-dimensional vector spaces, which is useful in many tasks such as\nvisualization, node class...

2015 4564 citations

Publication Info

Year
2015
Type
preprint
Citations
301
Access
Closed

External Links

Citation Metrics

301
OpenAlex

Cite This

Rupesh K. Srivastava, Klaus Greff, Jürgen Schmidhuber (2015). Highway Networks. arXiv (Cornell University) .