End-to-end Continuous Speech Recognition using Attention-based Recurrent NN: First Results

Abstract

We replace the Hidden Markov Model (HMM) which is traditionally used in in continuous speech recognition with a bi-directional recurrent neural network encoder coupled to a recurrent neural network decoder that directly emits a stream of phonemes. The alignment between the input and output sequences is established using an attention mechanism: the decoder emits each symbol based on a context created with a subset of input symbols elected by the attention mechanism. We report initial results demonstrating that this new approach achieves phoneme error rates that are comparable to the state-of-the-art HMM-based decoders, on the TIMIT dataset.

Keywords

Hidden Markov modelTIMITSpeech recognitionComputer scienceRecurrent neural networkEncoderArtificial neural networkDecoding methodsContext (archaeology)End-to-end principleWord error ratePattern recognition (psychology)Artificial intelligenceAlgorithm

Affiliated Institutions

Related Publications

Global optimization of a neural network-hidden Markov model hybrid

Yoshua Bengio , Renato De Mori , Giovanni Flammia +1 more

An original method for integrating artificial neural networks (ANN) with hidden Markov models (HMM) is proposed. ANNs are suitable for performing phonetic classification, wherea...

2002 18 citations

Applying Convolutional Neural Networks concepts to hybrid NN-HMM model for speech recognition

Ossama Abdel‐Hamid , Abdelrahman Mohamed , Hui Jiang +1 more

Convolutional Neural Networks (CNN) have showed success in achieving translation invariance for many image processing tasks. The success is largely attributed to the use of loca...

2012 885 citations

Lattice-based optimization of sequence classification criteria for neural-network acoustic modeling

Brian Kingsbury

Acoustic models used in hidden Markov model/neural-network (HMM/NN) speech recognition systems are usually trained with a frame-based cross-entropy error criterion. In contrast,...

2009 238 citations

Sparse Multilayer Perceptron for Phoneme Recognition

G. S. V. S. Sivaram , Hynek Heřmanský

This paper introduces the sparse multilayer perceptron (SMLP) which jointly learns a sparse feature representation and nonlinear classifier boundaries to optimally discriminate ...

2011 IEEE Transactions on Audio Speech and... 65 citations

Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups

Geoffrey E. Hinton , Li Deng , Dong Yu +8 more

Most current speech recognition systems use hidden Markov models (HMMs) to deal with the temporal variability of speech and Gaussian mixture models (GMMs) to determine how well ...

2012 IEEE Signal Processing Magazine 10065 citations

Publication Info

Year: 2014
Type: preprint
Citations: 415
Access: Closed

External Links

View on DOI.org

Social Impact

Altmetric

End-to-end Continuous Speech Recognition using Attention-based Recurrent NN: First Results

PlumX Metrics

Social media, news, blog, policy document mentions

Citation Metrics

415

OpenAlex

Cite This

APA Style

                            
                                    Jan Chorowski, 
                                
                                    Dzmitry Bahdanau, 
                                
                                    Kyunghyun Cho
                                
                                et al.
                            
                            (2014). 
                            End-to-end Continuous Speech Recognition using Attention-based Recurrent NN: First Results. 
                            arXiv (Cornell University)
                            
                            .
                            https://doi.org/10.48550/arxiv.1412.1602

Identifiers

DOI: 10.48550/arxiv.1412.1602