Abstract

There are two widely known issues with properly training Recurrent Neural Networks, the vanishing and the exploding gradient problems detailed in Bengio et al. (1994). In this paper we attempt to improve the understanding of the underlying issues by exploring these problems from an analytical, a geometric and a dynamical systems perspective. Our analysis is used to justify a simple yet effective solution. We propose a gradient norm clipping strategy to deal with exploding gradients and a soft constraint for the vanishing gradients problem. We validate empirically our hypothesis and proposed solutions in the experimental section.

Keywords

Constraint (computer-aided design)Perspective (graphical)Computer scienceArtificial neural networkSimple (philosophy)Norm (philosophy)Artificial intelligenceGradient descentAlgorithmMathematical optimizationMathematicsGeometryEpistemology

Affiliated Institutions

Related Publications

Publication Info

Year
2012
Type
preprint
Citations
3778
Access
Closed

External Links

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

3778
OpenAlex

Cite This

Razvan Pascanu, Tomáš Mikolov, Yoshua Bengio (2012). On the difficulty of training Recurrent Neural Networks. arXiv (Cornell University) . https://doi.org/10.48550/arxiv.1211.5063

Identifiers

DOI
10.48550/arxiv.1211.5063