Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups

Abstract

Most current speech recognition systems use hidden Markov models (HMMs) to deal with the temporal variability of speech and Gaussian mixture models (GMMs) to determine how well each state of each HMM fits a frame or a short window of frames of coefficients that represents the acoustic input. An alternative way to evaluate the fit is to use a feed-forward neural network that takes several frames of coefficients as input and produces posterior probabilities over HMM states as output. Deep neural networks (DNNs) that have many hidden layers and are trained using new methods have been shown to outperform GMMs on a variety of speech recognition benchmarks, sometimes by a large margin. This article provides an overview of this progress and represents the shared views of four research groups that have had recent successes in using DNNs for acoustic modeling in speech recognition.

Keywords

Hidden Markov modelSpeech recognitionComputer scienceMixture modelArtificial neural networkMargin (machine learning)Deep neural networksPattern recognition (psychology)Frame (networking)Artificial intelligenceAcoustic modelGaussianSpeech processingMachine learning

Affiliated Institutions

Related Publications

Lattice-based optimization of sequence classification criteria for neural-network acoustic modeling

Brian Kingsbury

Acoustic models used in hidden Markov model/neural-network (HMM/NN) speech recognition systems are usually trained with a frame-based cross-entropy error criterion. In contrast,...

2009 238 citations

Boosting attribute and phone estimation accuracies with deep neural networks for detection-based speech recognition

Dong Yu , Sabato Marco Siniscalchi , Li Deng +1 more

Generation of high-precision sub-phonetic attribute (also known as phonological features) and phone lattices is a key frontend component for detection-based bottom-up speech rec...

2012 64 citations

Deep Belief Networks using discriminative features for phone recognition

Abdelrahman Mohamed , Tara N. Sainath , George E. Dahl +3 more

Deep Belief Networks (DBNs) are multi-layer generative models. They can be trained to model windows of coefficients extracted from speech and they discover multiple layers of fe...

2011 289 citations

An overlapping-feature-based phonological model incorporating linguistic constraints: Applications to speech recognition

Jiping Sun , Li Deng

Modeling phonological units of speech is a critical issue in speech recognition. In this paper, our recent development of an overlapping-feature-based phonological model that re...

2002 The Journal of the Acoustical Society... 71 citations

Speech Recognition Using Augmented Conditional Random Fields

Yasser Hifny , Steve Renals

Acoustic modeling based on hidden Markov models (HMMs) is employed by state-of-the-art stochastic speech recognition systems. Although HMMs are a natural choice to warp the time...

2009 IEEE Transactions on Audio Speech and... 82 citations

Publication Info

Year: 2012
Type: article
Volume: 29
Issue: 6
Pages: 82-97
Citations: 10065
Access: Closed

External Links

View on DOI.org

Social Impact

Altmetric

Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups

PlumX Metrics

Social media, news, blog, policy document mentions

Citation Metrics

10065

OpenAlex

Cite This

APA Style

                            
                                
                                    Geoffrey E. Hinton, 
                                
                                    Li Deng, 
                                
                                    Dong Yu
                                
                                et al.
                            
                            (2012). 
                            Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups. 
                            IEEE Signal Processing Magazine
                            , 29
                            (6)
                            , 82-97.
                            https://doi.org/10.1109/msp.2012.2205597
                        

Identifiers

DOI: 10.1109/msp.2012.2205597