Long-term Recurrent Convolutional Networks for Visual Recognition and Description

Abstract

Models based on deep convolutional networks have dominated recent image interpretation tasks; we investigate whether models which are also recurrent are effective for tasks involving sequences, visual and otherwise.We describe a class of recurrent convolutional architectures which is end-to-end trainable and suitable for large-scale visual understanding tasks, and demonstrate the value of these models for activity recognition, image captioning, and video description.In contrast to previous models which assume a fixed visual representation or perform simple temporal averaging for sequential processing, recurrent convolutional models are "doubly deep" in that they learn compositional representations in space and time.Learning long-term dependencies is possible when nonlinearities are incorporated into the network state updates.Differentiable recurrent models are appealing in that they can directly map variable-length inputs (e.g., videos) to variable-length outputs (e.g., natural language text) and can model complex temporal dynamics; yet they can be optimized with backpropagation.Our recurrent sequence models are directly connected to modern visual convolutional network models and can be jointly trained to learn temporal dynamics and convolutional perceptual representations.Our results show that such models have distinct advantages over state-of-the-art models for recognition or generation which are separately defined or optimized.

Keywords

Term (time)Computer scienceConvolutional neural networkArtificial intelligencePattern recognition (psychology)Physics

Related Publications

A Learning Algorithm for Continually Running Fully Recurrent Neural Networks

Ronald J. Williams , David Zipser

The exact form of a gradient-following learning algorithm for completely recurrent networks running in continually sampled time is derived and used as the basis for practical al...

1989 Neural Computation 4324 citations

Recurrent Neural Networks for Multivariate Time Series with Missing Values

Zhengping Che , Sanjay Purushotham , Kyunghyun Cho +2 more

Multivariate time series data in practical applications, such as health care, geoscience, and biology, are characterized by a variety of missing values. In time series predictio...

2018 Scientific Reports 1920 citations

Recurrent nets that time and count

Felix A. Gers , Jürgen Schmidhuber

The size of the time intervals between events conveys information essential for numerous sequential tasks such as motor control and rhythm detection. While hidden Markov models ...

2000 Proceedings of the IEEE-INNS-ENNS Int... 627 citations

Pedestrian detection aided by deep learning semantic tasks

Yonglong Tian , Ping Luo , Xiaogang Wang +1 more

Deep learning methods have achieved great successes in pedestrian detection, owing to its ability to learn discriminative features from raw pixels. However, they treat pedestria...

2015 418 citations

Speech recognition with deep recurrent neural networks

Alex Graves , Abdelrahman Mohamed , Geoffrey E. Hinton

Recurrent neural networks (RNNs) are a powerful model for sequential data. End-to-end training methods such as Connectionist Temporal Classification make it possible to train RN...

2013 8613 citations

Publication Info

Year: 2014
Type: preprint
Citations: 1045
Access: Closed

External Links

View on DOI.org

Social Impact

Altmetric

Long-term Recurrent Convolutional Networks for Visual Recognition and Description

PlumX Metrics

Social media, news, blog, policy document mentions

Citation Metrics

1045

OpenAlex

622

CrossRef

Cite This

APA Style

                            
                                    Jeff Donahue, 
                                
                                    Lisa Anne Hendricks, 
                                
                                    Sergio Guadarrama
                                
                                et al.
                            
                            (2014). 
                            Long-term Recurrent Convolutional Networks for Visual Recognition and Description. 
                            
                            .
                            https://doi.org/10.21236/ada623249

Identifiers

DOI: 10.21236/ada623249

Data Quality

Data completeness: 70%