Deep and Wide: Multiple Layers in Automatic Speech Recognition

2011 IEEE Transactions on Audio Speech and Language Processing 127 citations

Abstract

This paper reviews a line of research carried out over the last decade in speech recognition assisted by discriminatively trained, feedforward networks. The particular focus is on the use of multiple layers of processing preceding the hidden Markov model based decoding of word sequences. Emphasis is placed on the use of multiple streams of highly dimensioned layers, which have proven useful for this purpose. This paper ultimately concludes that while the deep processing structures can provide improvements for this genre, choice of features and the structure with which they are incorporated, including layer width, can also be significant factors.

Keywords

Computer scienceFocus (optics)Speech recognitionHidden Markov modelEmphasis (telecommunications)Decoding methodsLayer (electronics)Artificial intelligenceWord (group theory)Feed forwardNatural language processingPattern recognition (psychology)LinguisticsTelecommunications

Affiliated Institutions

Related Publications

Publication Info

Year
2011
Type
article
Volume
20
Issue
1
Pages
7-13
Citations
127
Access
Closed

External Links

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

127
OpenAlex

Cite This

Nelson Morgan (2011). Deep and Wide: Multiple Layers in Automatic Speech Recognition. IEEE Transactions on Audio Speech and Language Processing , 20 (1) , 7-13. https://doi.org/10.1109/tasl.2011.2116010

Identifiers

DOI
10.1109/tasl.2011.2116010