Abstract

We replace the Hidden Markov Model (HMM) which is traditionally used in in continuous speech recognition with a bi-directional recurrent neural network encoder coupled to a recurrent neural network decoder that directly emits a stream of phonemes. The alignment between the input and output sequences is established using an attention mechanism: the decoder emits each symbol based on a context created with a subset of input symbols elected by the attention mechanism. We report initial results demonstrating that this new approach achieves phoneme error rates that are comparable to the state-of-the-art HMM-based decoders, on the TIMIT dataset.

Keywords

Hidden Markov modelTIMITSpeech recognitionComputer scienceRecurrent neural networkEncoderArtificial neural networkDecoding methodsContext (archaeology)End-to-end principleWord error ratePattern recognition (psychology)Artificial intelligenceAlgorithm

Affiliated Institutions

Related Publications

Publication Info

Year
2014
Type
preprint
Citations
415
Access
Closed

External Links

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

415
OpenAlex

Cite This

Jan Chorowski, Dzmitry Bahdanau, Kyunghyun Cho et al. (2014). End-to-end Continuous Speech Recognition using Attention-based Recurrent NN: First Results. arXiv (Cornell University) . https://doi.org/10.48550/arxiv.1412.1602

Identifiers

DOI
10.48550/arxiv.1412.1602