Auto-encoder bottleneck features using deep belief networks

Abstract

Neural network (NN) bottleneck (BN) features are typically created by training a NN with a middle bottleneck layer. Recently, an alternative structure was proposed which trains a NN with a constant number of hidden units to predict output targets, and then reduces the dimensionality of these output probabilities through an auto-encoder, to create auto-encoder bottleneck (AE-BN) features. The benefit of placing the BN after the posterior estimation network is that it avoids the loss in frame classification accuracy incurred by networks that place the BN before the softmax. In this work, we investigate the use of pre-training when creating AE-BN features. Our experiments indicate that with the AE-BN architecture, pre-trained and deeper NNs produce better AE-BN features. On a 50-hour English Broadcast News task, the AE-BN features provide over a 1% absolute improvement compared to a state-of-the-art GMM/HMM with a WER of 18.8% and pre-trained NN hybrid system with a WER of 18.4%. In addition, on a larger 430-hour Broadcast News task, AE-BN features provide a 0.5% absolute improvement over a strong GMM/HMM baseline with a WER of 16.0%. Finally, system combination with the GMM/HMM baseline and AE-BN systems provides an additional 0.5% absolute on 430 hours over the AE-BN system alone, yielding a final WER of 15.0%.

Keywords

Softmax functionBottleneckComputer scienceHidden Markov modelArtificial neural networkSpeech recognitionPattern recognition (psychology)InitializationArtificial intelligenceEncoderEmbedded system

Affiliated Institutions

IBM (United States) US

Related Publications

Lattice-based optimization of sequence classification criteria for neural-network acoustic modeling

Brian Kingsbury

Acoustic models used in hidden Markov model/neural-network (HMM/NN) speech recognition systems are usually trained with a frame-based cross-entropy error criterion. In contrast,...

2009 238 citations

Applying Convolutional Neural Networks concepts to hybrid NN-HMM model for speech recognition

Ossama Abdel‐Hamid , Abdelrahman Mohamed , Hui Jiang +1 more

Convolutional Neural Networks (CNN) have showed success in achieving translation invariance for many image processing tasks. The success is largely attributed to the use of loca...

2012 885 citations

Probabilistic and Bottle-Neck Features for LVCSR of Meetings

František Grézl , Martin Karafiát , Stanislav Kontar +1 more

In recent years, probabilistic features became an integral part of state-of-the-are LVCSR systems. In this work, we are exploring the possibility of obtaining the features direc...

2007 341 citations

Exemplar-Based Sparse Representation Features: From TIMIT to LVCSR

Tara N. Sainath , Bhuvana Ramabhadran , Michael Picheny +2 more

The use of exemplar-based methods, such as support vector machines (SVMs), k-nearest neighbors (kNNs) and sparse representations (SRs), in speech recognition has thus far been l...

2011 IEEE Transactions on Audio Speech and... 65 citations

Emerging Properties in Self-Supervised Vision Transformers

Mathilde Caron , Hugo Touvron , Ishan Misra +4 more

In this paper, we question if self-supervised learning provides new properties to Vision Transformer (ViT) that stand out compared to convolutional networks (convnets). Beyond t...

2021 2021 IEEE/CVF International Conferenc... 4220 citations

Publication Info

Year: 2012
Type: article
Citations: 181
Access: Closed

External Links

View on DOI.org

Social Impact

Altmetric

Auto-encoder bottleneck features using deep belief networks

PlumX Metrics

Social media, news, blog, policy document mentions

Citation Metrics

181

OpenAlex

Cite This

APA Style

                            
                                    Tara N. Sainath, 
                                
                                    Brian Kingsbury, 
                                
                                    Bhuvana Ramabhadran
                                
                            (2012). 
                            Auto-encoder bottleneck features using deep belief networks. 
                            
                            .
                            https://doi.org/10.1109/icassp.2012.6288833

Identifiers

DOI: 10.1109/icassp.2012.6288833