Abstract

Several variants of the long short-term memory (LSTM) architecture for recurrent neural networks have been proposed since its inception in 1995. In recent years, these networks have become the state-of-the-art models for a variety of machine learning problems. This has led to a renewed interest in understanding the role and utility of various computational components of typical LSTM variants. In this paper, we present the first large-scale analysis of eight LSTM variants on three representative tasks: speech recognition, handwriting recognition, and polyphonic music modeling. The hyperparameters of all LSTM variants for each task were optimized separately using random search, and their importance was assessed using the powerful functional ANalysis Of VAriance framework. In total, we summarize the results of 5400 experimental runs ( ≈ 15 years of CPU time), which makes our study the largest of its kind on LSTM networks. Our results show that none of the variants can improve upon the standard LSTM architecture significantly, and demonstrate the forget gate and the output activation function to be its most critical components. We further observe that the studied hyperparameters are virtually independent and derive guidelines for their efficient adjustment.

Keywords

HyperparameterComputer scienceTask (project management)Artificial intelligenceRecurrent neural networkInferenceVariety (cybernetics)Machine learningPolyphonyFunction (biology)Space (punctuation)Artificial neural networkSpeech recognition

Affiliated Institutions

Related Publications

Long Short-Term Memory

Learning to store information over extended time intervals by recurrent backpropagation takes a very long time, mostly because of insufficient, decaying error backflow. We brief...

1997 Neural Computation 90535 citations

Publication Info

Year
2016
Type
article
Volume
28
Issue
10
Pages
2222-2232
Citations
6357
Access
Closed

Social Impact

Altmetric

Social media, news, blog, policy document mentions

Citation Metrics

6357
OpenAlex
302
Influential
4899
CrossRef

Cite This

Klaus Greff, Rupesh K. Srivastava, Jan Koutník et al. (2016). LSTM: A Search Space Odyssey. IEEE Transactions on Neural Networks and Learning Systems , 28 (10) , 2222-2232. https://doi.org/10.1109/tnnls.2016.2582924

Identifiers

DOI
10.1109/tnnls.2016.2582924
PMID
27411231
arXiv
1503.04069

Data Quality

Data completeness: 88%