Revisiting Recurrent Neural Networks for robust ASR

Abstract

In this paper, we show how new training principles and optimization techniques for neural networks can be used for different network structures. In particular, we revisit the Recurrent Neural Network (RNN), which explicitly models the Markovian dynamics of a set of observations through a non-linear function with a much larger hidden state space than traditional sequence models such as an HMM. We apply pretraining principles used for Deep Neural Networks (DNNs) and second-order optimization techniques to train an RNN. Moreover, we explore its application in the Aurora2 speech recognition task under mismatched noise conditions using a Tandem approach. We observe top performance on clean speech, and under high noise conditions, compared to multi-layer perceptrons (MLPs) and DNNs, with the added benefit of being a "deeper" model than an MLP but more compact than a DNN.

Keywords

Computer scienceRecurrent neural networkPerceptronArtificial neural networkHidden Markov modelArtificial intelligenceTime delay neural networkSet (abstract data type)Noise (video)Speech recognitionTask (project management)Deep neural networksMultilayer perceptronMachine learningPattern recognition (psychology)

Affiliated Institutions

Related Publications

Boosting attribute and phone estimation accuracies with deep neural networks for detection-based speech recognition

Dong Yu , Sabato Marco Siniscalchi , Li Deng +1 more

Generation of high-precision sub-phonetic attribute (also known as phonological features) and phone lattices is a key frontend component for detection-based bottom-up speech rec...

2012 64 citations

Comparing multilayer perceptron to Deep Belief Network Tandem features for robust ASR

Oriol Vinyals , Suman Ravuri

In this paper, we extend the work done on integrating multilayer perceptron (MLP) networks with HMM systems via the Tandem approach. In particular, we explore whether the use of...

2011 55 citations

Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups

Geoffrey E. Hinton , Li Deng , Dong Yu +8 more

Most current speech recognition systems use hidden Markov models (HMMs) to deal with the temporal variability of speech and Gaussian mixture models (GMMs) to determine how well ...

2012 IEEE Signal Processing Magazine 10065 citations

Backpropagation training for multilayer conditional random field based phone recognition

Rohit Prabhavalkar , Eric Fosler‐Lussier

Conditional random fields (CRFs) have recently found increased popularity in automatic speech recognition (ASR) applications. CRFs have previously been shown to be effective com...

2010 31 citations

Deep neural networks are easily fooled: High confidence predictions for unrecognizable images

Anh‐Tu Nguyen , Jason Yosinski , Jeff Clune

Deep neural networks (DNNs) have recently been achieving state-of-the-art performance on a variety of pattern-recognition tasks, most notably visual classification problems. Giv...

2015 3232 citations

Publication Info

Year: 2012
Type: article
Pages: 4085-4088
Citations: 139
Access: Closed

External Links

View on DOI.org

Social Impact

Altmetric

Revisiting Recurrent Neural Networks for robust ASR

PlumX Metrics

Social media, news, blog, policy document mentions

Citation Metrics

139

OpenAlex

Cite This

APA Style

                            
                                    Oriol Vinyals, 
                                
                                    Suman Ravuri, 
                                
                                    Daniel Povey
                                
                            (2012). 
                            Revisiting Recurrent Neural Networks for robust ASR. 
                            
                            , 4085-4088.
                            https://doi.org/10.1109/icassp.2012.6288816

Identifiers

DOI: 10.1109/icassp.2012.6288816