Abstract

Despite successes, there are still significant limitations to speech recognition performance, particularly for conversational speech and/or for speech with significant acoustic degradations from noise or reverberation. For this reason, authors have proposed methods that incorporate different (and larger) analysis windows, which are described in this article. Note in passing that we and many others have already taken advantage of processing techniques that incorporate information over long time ranges, for instance for normalization (by cepstral mean subtraction as stated in B. Atal (1974) or relative spectral analysis (RASTA) based in H. Hermansky and N. Morgan (1994)). They also have proposed features that are based on speech sound class posterior probabilities, which have good properties for both classification and stream combination.

Keywords

Speech recognitionComputer scienceNormalization (sociology)ReverberationCepstrumMel-frequency cepstrumSpeech processingLinear predictive codingArtificial intelligencePattern recognition (psychology)Feature extractionAcoustics

Affiliated Institutions

Related Publications

Publication Info

Year
2005
Type
article
Volume
22
Issue
5
Pages
81-88
Citations
94
Access
Closed

External Links

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

94
OpenAlex

Cite This

N. Morgan, Qifeng Zhu, Andreas Stolcke et al. (2005). Pushing the envelope - aside [speech recognition. IEEE Signal Processing Magazine , 22 (5) , 81-88. https://doi.org/10.1109/msp.2005.1511826

Identifiers

DOI
10.1109/msp.2005.1511826