Highway Networks | RDL Research Database

Abstract

There is plenty of theoretical and empirical evidence that depth of neural networks is a crucial ingredient for their success. However, network training becomes more difficult with increasing depth and training of very deep networks remains an open problem. In this extended abstract, we introduce a new architecture designed to ease gradient-based training of very deep networks. We refer to networks with this architecture as highway networks, since they allow unimpeded information flow across several layers on information highways. The architecture is characterized by the use of gating units which learn to regulate the flow of information through a network. Highway networks with hundreds of layers can be trained directly using stochastic gradient descent and with a variety of activation functions, opening up the possibility of studying extremely deep and efficient architectures.

Keywords

Computer scienceArchitectureStochastic gradient descentInformation flowNetwork architectureVariety (cybernetics)Artificial intelligenceArtificial neural networkGradient descentDeep neural networksDistributed computingComputer architectureComputer networkGeography

Related Publications

Training Very Deep Networks

Rupesh K. Srivastava , Klaus Greff , Jürgen Schmidhuber

Theoretical and empirical evidence indicates that the depth of neural networks is crucial for their success. However, training becomes more difficult as depth increases, and tra...

2015 arXiv (Cornell University) 1100 citations

Going deeper with convolutions

Christian Szegedy , Wei Liu , Yangqing Jia +6 more

We propose a deep convolutional neural network architecture codenamed Inception that achieves the new state of the art for classification and detection in the ImageNet Large-Sca...

2015 45596 citations

Convergence Results for Neural Networks via Electrodynamics

Djork-Arné Clevert , Thomas Unterthiner , Sepp Hochreiter

We study whether a depth two neural network can learn another depth two network using gradient descent. Assuming a linear output node, we show that the question of whether gradi...

2018 arXiv (Cornell University) 2912 citations

Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

Sergey Ioffe , Christian Szegedy

Training Deep Neural Networks is complicated by the fact that the distribution of each layer's inputs changes during training, as the parameters of the previous layers change. T...

2024 arXiv (Cornell University) 15635 citations

LINE

Jian Tang , Meng Qu , Mingzhe Wang +3 more

This paper studies the problem of embedding very large information networks\ninto low-dimensional vector spaces, which is useful in many tasks such as\nvisualization, node class...

2015 4564 citations

Publication Info

Year: 2015
Type: preprint
Citations: 301
Access: Closed

External Links

Citation Metrics

301

OpenAlex

Cite This

APA Style

                            
                                    Rupesh K. Srivastava, 
                                
                                    Klaus Greff, 
                                
                                    Jürgen Schmidhuber
                                
                            (2015). 
                            Highway Networks. 
                            arXiv (Cornell University)
                            
                            .