Sparse Autoencoder-Based Feature Transfer Learning for Speech Emotion Recognition

Jun Deng; Zixing Zhang; Erik Marchi; Björn W. Schuller

doi:10.1109/acii.2013.90

Abstract

In speech emotion recognition, training and test data used for system development usually tend to fit each other perfectly, but further 'similar' data may be available. Transfer learning helps to exploit such similar data for training despite the inherent dissimilarities in order to boost a recogniser's performance. In this context, this paper presents a sparse auto encoder method for feature transfer learning for speech emotion recognition. In our proposed method, a common emotion-specific mapping rule is learnt from a small set of labelled data in a target domain. Then, newly reconstructed data are obtained by applying this rule on the emotion-specific data in a different domain. The experimental results evaluated on six standard databases show that our approach significantly improves the performance relative to learning each source domain independently.

Keywords

AutoencoderComputer scienceTransfer of learningArtificial intelligenceFeature (linguistics)Speech recognitionDomain (mathematical analysis)ExploitContext (archaeology)EncoderTest dataPattern recognition (psychology)Training setSet (abstract data type)Data setMachine learningDeep learningMathematics

Affiliated Institutions

Related Publications

Improved phone recognition using Bayesian triphone models

Ming Jiang , F.J. Smith

A crucial issue in triphone based continuous speech recognition is the large number of models to be estimated against the limited availability of training data. This problem can...

2002 45 citations

Exploratory analysis and visualization of speech and music by locally linear embedding

Vaibhav Jain , L.K. Saul

Many problems in voice recognition and audio processing involve feature extraction from raw waveforms. The goal of feature extraction is to reduce the dimensionality of the audi...

2004 46 citations

Universal Sentence Encoder

Daniel Cer , Yinfei Yang , Sheng-yi Kong +10 more

We present models for encoding sentences into embedding vectors that specifically target transfer learning to other NLP tasks. The models are efficient and result in accurate pe...

2018 arXiv (Cornell University) 1289 citations

Enhancements in Immediate Speech Emotion Detection: Harnessing Prosodic and Spectral Characteristics

Z. B. M. D. Shah , SHAN Zhiyong , Adnan

Speech is essential to human communication for expressing and understanding feelings. Emotional speech processing has challenges with expert data sampling, dataset organization,...

2024 International Journal of Innovative S... 1659 citations

Speech Recognition Using Augmented Conditional Random Fields

Yasser Hifny , Steve Renals

Acoustic modeling based on hidden Markov models (HMMs) is employed by state-of-the-art stochastic speech recognition systems. Although HMMs are a natural choice to warp the time...

2009 IEEE Transactions on Audio Speech and... 82 citations

Publication Info

Year: 2013
Type: article
Pages: 511-516
Citations: 358
Access: Closed

External Links

Download PDF (Free) View on DOI.org Semantic Scholar

Social Impact

Altmetric

Sparse Autoencoder-Based Feature Transfer Learning for Speech Emotion Recognition

PlumX Metrics

Social media, news, blog, policy document mentions

Citation Metrics

358

OpenAlex

Influential

252

CrossRef

Cite This

APA Style

                            
                                    Jun Deng, 
                                
                                    Zixing Zhang, 
                                
                                    Erik Marchi
                                
                                et al.
                            
                            (2013). 
                            Sparse Autoencoder-Based Feature Transfer Learning for Speech Emotion Recognition. 
                            2013 Humaine Association Conference on Affective Computing and Intelligent Interaction
                            
                            , 511-516.
                            https://doi.org/10.1109/acii.2013.90

Identifiers

DOI: 10.1109/acii.2013.90

Data Quality

Data completeness: 81%