Abstract

Many real-world sequence learning tasks require the prediction of sequences of labels from noisy, unsegmented input data. In speech recognition, for example, an acoustic signal is transcribed into words or sub-word units. Recurrent neural networks (RNNs) are powerful sequence learners that would seem well suited to such tasks. However, because they require pre-segmented training data, and post-processing to transform their outputs into label sequences, their applicability has so far been limited. This paper presents a novel method for training RNNs to label unsegmented sequences directly, thereby solving both problems. An experiment on the TIMIT speech corpus demonstrates its advantages over both a baseline HMM and a hybrid HMM-RNN.

Keywords

TIMITComputer scienceSpeech recognitionRecurrent neural networkConnectionismHidden Markov modelSequence (biology)Artificial intelligenceWord (group theory)Sequence labelingPattern recognition (psychology)Artificial neural networkNatural language processingMathematics

Affiliated Institutions

Related Publications

Publication Info

Year
2006
Type
article
Pages
369-376
Citations
5199
Access
Closed

External Links

Social Impact

Altmetric

Social media, news, blog, policy document mentions

Citation Metrics

5199
OpenAlex

Cite This

Alex Graves, Santiago Fernández, Faustino Gomez et al. (2006). Connectionist temporal classification. , 369-376. https://doi.org/10.1145/1143844.1143891

Identifiers

DOI
10.1145/1143844.1143891