Abstract

Most current speech recognition systems use hidden Markov models (HMMs) to deal with the temporal variability of speech and Gaussian mixture models (GMMs) to determine how well each state of each HMM fits a frame or a short window of frames of coefficients that represents the acoustic input. An alternative way to evaluate the fit is to use a feed-forward neural network that takes several frames of coefficients as input and produces posterior probabilities over HMM states as output. Deep neural networks (DNNs) that have many hidden layers and are trained using new methods have been shown to outperform GMMs on a variety of speech recognition benchmarks, sometimes by a large margin. This article provides an overview of this progress and represents the shared views of four research groups that have had recent successes in using DNNs for acoustic modeling in speech recognition.

Keywords

Hidden Markov modelSpeech recognitionComputer scienceMixture modelArtificial neural networkMargin (machine learning)Deep neural networksPattern recognition (psychology)Frame (networking)Artificial intelligenceAcoustic modelGaussianSpeech processingMachine learning

Affiliated Institutions

Related Publications

Publication Info

Year
2012
Type
article
Volume
29
Issue
6
Pages
82-97
Citations
10065
Access
Closed

External Links

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

10065
OpenAlex

Cite This

Geoffrey E. Hinton, Li Deng, Dong Yu et al. (2012). Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups. IEEE Signal Processing Magazine , 29 (6) , 82-97. https://doi.org/10.1109/msp.2012.2205597

Identifiers

DOI
10.1109/msp.2012.2205597