Abstract
Conditional random fields (CRFs) have recently found increased popularity in automatic speech recognition (ASR) applications. CRFs have previously been shown to be effective combiners of posterior estimates from multilayer perceptrons (MLPs) in phone and word recognition tasks. In this paper, we describe a novel hybrid Multilayer-CRF structure (ML-CRF), where a MLP-like hidden layer serves as input to the CRF; moreover, we propose a technique for directly training the ML-CRF to optimize a conditional log-likelihood based criterion, based on error backpropagation. The proposed technique thus allows for the implicit learning of suitable feature functions for the CRF. We present results for initial phone recognition experiments on the TIMIT database that indicate that our proposed method is a promising approach for training CRFs.
Keywords
Affiliated Institutions
Related Publications
Boosting attribute and phone estimation accuracies with deep neural networks for detection-based speech recognition
Generation of high-precision sub-phonetic attribute (also known as phonological features) and phone lattices is a key frontend component for detection-based bottom-up speech rec...
Sparse Multilayer Perceptron for Phoneme Recognition
This paper introduces the sparse multilayer perceptron (SMLP) which jointly learns a sparse feature representation and nonlinear classifier boundaries to optimally discriminate ...
Speech Recognition Using Augmented Conditional Random Fields
Acoustic modeling based on hidden Markov models (HMMs) is employed by state-of-the-art stochastic speech recognition systems. Although HMMs are a natural choice to warp the time...
Improved phone recognition using Bayesian triphone models
A crucial issue in triphone based continuous speech recognition is the large number of models to be estimated against the limited availability of training data. This problem can...
Lattice-based optimization of sequence classification criteria for neural-network acoustic modeling
Acoustic models used in hidden Markov model/neural-network (HMM/NN) speech recognition systems are usually trained with a frame-based cross-entropy error criterion. In contrast,...
Publication Info
- Year
- 2010
- Type
- article
- Pages
- 5534-5537
- Citations
- 31
- Access
- Closed
External Links
Social Impact
Social media, news, blog, policy document mentions
Citation Metrics
Cite This
Identifiers
- DOI
- 10.1109/icassp.2010.5495222