Deep and Wide: Multiple Layers in Automatic Speech Recognition

Abstract

This paper reviews a line of research carried out over the last decade in speech recognition assisted by discriminatively trained, feedforward networks. The particular focus is on the use of multiple layers of processing preceding the hidden Markov model based decoding of word sequences. Emphasis is placed on the use of multiple streams of highly dimensioned layers, which have proven useful for this purpose. This paper ultimately concludes that while the deep processing structures can provide improvements for this genre, choice of features and the structure with which they are incorporated, including layer width, can also be significant factors.

Keywords

Computer scienceFocus (optics)Speech recognitionHidden Markov modelEmphasis (telecommunications)Decoding methodsLayer (electronics)Artificial intelligenceWord (group theory)Feed forwardNatural language processingPattern recognition (psychology)LinguisticsTelecommunications

Affiliated Institutions

Related Publications

Boosting attribute and phone estimation accuracies with deep neural networks for detection-based speech recognition

Dong Yu , Sabato Marco Siniscalchi , Li Deng +1 more

Generation of high-precision sub-phonetic attribute (also known as phonological features) and phone lattices is a key frontend component for detection-based bottom-up speech rec...

2012 64 citations

Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups

Geoffrey E. Hinton , Li Deng , Dong Yu +8 more

Most current speech recognition systems use hidden Markov models (HMMs) to deal with the temporal variability of speech and Gaussian mixture models (GMMs) to determine how well ...

2012 IEEE Signal Processing Magazine 10065 citations

Lattice-based optimization of sequence classification criteria for neural-network acoustic modeling

Brian Kingsbury

Acoustic models used in hidden Markov model/neural-network (HMM/NN) speech recognition systems are usually trained with a frame-based cross-entropy error criterion. In contrast,...

2009 238 citations

Deep Belief Networks using discriminative features for phone recognition

Abdelrahman Mohamed , Tara N. Sainath , George E. Dahl +3 more

Deep Belief Networks (DBNs) are multi-layer generative models. They can be trained to model windows of coefficients extracted from speech and they discover multiple layers of fe...

2011 289 citations

Applying Convolutional Neural Networks concepts to hybrid NN-HMM model for speech recognition

Ossama Abdel‐Hamid , Abdelrahman Mohamed , Hui Jiang +1 more

Convolutional Neural Networks (CNN) have showed success in achieving translation invariance for many image processing tasks. The success is largely attributed to the use of loca...

2012 885 citations

Publication Info

Year: 2011
Type: article
Volume: 20
Issue: 1
Pages: 7-13
Citations: 127
Access: Closed

External Links

View on DOI.org

Social Impact

Altmetric

Deep and Wide: Multiple Layers in Automatic Speech Recognition

PlumX Metrics

Social media, news, blog, policy document mentions

Citation Metrics

127

OpenAlex

Cite This

APA Style

                            
                                    Nelson Morgan
                                
                            (2011). 
                            Deep and Wide: Multiple Layers in Automatic Speech Recognition. 
                            IEEE Transactions on Audio Speech and Language Processing
                            , 20
                            (1)
                            , 7-13.
                            https://doi.org/10.1109/tasl.2011.2116010

Identifiers

DOI: 10.1109/tasl.2011.2116010