Learning to forget: continual prediction with LSTM

Felix A. Gers; J. Schmidhuber; Fred Cummins

doi:10.1049/cp:19991218

Abstract

Long short-term memory (LSTM) can solve many tasks not solvable by previous learning algorithms for recurrent neural networks (RNNs). We identify a weakness of LSTM networks processing continual input streams without explicitly marked sequence ends. Without resets, the internal state values may grow indefinitely and eventually cause the network to break down. Our remedy is an adaptive gate that enables an LSTM cell to learn to reset itself at appropriate times, thus releasing internal resources. We review an illustrative benchmark problem on which standard LSTM outperforms other RNN algorithms. All algorithms (including LSTM) fail to solve a continual version of that problem. LSTM with forget gates, however, easily solves it in an elegant way.

Keywords

Benchmark (surveying)Recurrent neural networkComputer scienceArtificial intelligenceReset (finance)Sequence (biology)Long short term memoryState (computer science)Deep learningMachine learningArtificial neural networkAlgorithm

Affiliated Institutions

Dalle Molle Institute for Artificial Intelligence Research CH

Related Publications

A Review of Recurrent Neural Networks: LSTM Cells and Network Architectures

Yong Yu , Xiaosheng Si , Changhua Hu +1 more

Recurrent neural networks (RNNs) have been widely adopted in research areas concerned with sequential data, such as text, audio, and video. However, RNNs consisting of sigma cel...

2019 Neural Computation 4793 citations

Recurrent nets that time and count

Felix A. Gers , Jürgen Schmidhuber

The size of the time intervals between events conveys information essential for numerous sequential tasks such as motor control and rhythm detection. While hidden Markov models ...

2000 Proceedings of the IEEE-INNS-ENNS Int... 627 citations

Recurrent Neural Network Regularization

Wojciech Zaremba , Ilya Sutskever , Oriol Vinyals

We present a simple regularization technique for Recurrent Neural Networks (RNNs) with Long Short-Term Memory (LSTM) units. Dropout, the most successful technique for regularizi...

2014 arXiv (Cornell University) 2273 citations

Training Recurrent Networks by Evolino

Jürgen Schmidhuber , Daan Wierstra , Matteo Gagliolo +1 more

In recent years, gradient-based LSTM recurrent neural networks (RNNs) solved many previously RNN-unlearnable tasks. Sometimes, however, gradient information is of little use for...

2007 Neural Computation 251 citations

Speech recognition with deep recurrent neural networks

Alex Graves , Abdelrahman Mohamed , Geoffrey E. Hinton

Recurrent neural networks (RNNs) are a powerful model for sequential data. End-to-end training methods such as Connectionist Temporal Classification make it possible to train RN...

2013 8613 citations

Publication Info

Year: 1999
Type: article
Volume: 1999
Pages: 850-855
Citations: 2376
Access: Closed

External Links

View on DOI.org

Social Impact

Altmetric

Learning to forget: continual prediction with LSTM

PlumX Metrics

Social media, news, blog, policy document mentions

Citation Metrics

2376

OpenAlex

1155

CrossRef

Cite This

APA Style

                            
                                    Felix A. Gers, 
                                
                                    J. Schmidhuber, 
                                
                                    Fred Cummins
                                
                            (1999). 
                            Learning to forget: continual prediction with LSTM. 
                            9th International Conference on Artificial Neural Networks: ICANN '99
                            , 1999
                            
                            , 850-855.
                            https://doi.org/10.1049/cp:19991218

Identifiers

DOI: 10.1049/cp:19991218

Data Quality

Data completeness: 77%