Abstract

Speech is essential to human communication for expressing and understanding feelings. Emotional speech processing has challenges with expert data sampling, dataset organization, and computational complexity in large-scale analysis. This study aims to reduce data redundancy and high dimensionality by introducing a new speech emotion recognition system. The system employs Diffusion Map to reduce dimensionality and includes Decision Trees and K-Nearest Neighbors(KNN)ensemble classifiers. These strategies are suggested to increase voice emotion recognition accuracy. Speech emotion recognition is gaining popularity in affective computing for usage in medical, industry, and academics. This project aims to provide an efficient and robust real-time emotion identification framework. In order to identify emotions using supervised machine learning models, this work makes use of paralinguistic factors such as intensity, pitch, and MFCC. In order to classify data, experimental analysis integrates prosodic and spectral information utilizing methods like Random Forest, Multilayer Perceptron, SVM, KNN, and Gaussian Naïve Bayes. Fast training times make these machine learning models excellent for real-time applications. SVM and MLP have the highest accuracy at 70.86% and 79.52%, respectively. Comparisons to benchmarks show significant improvements over earlier models.

Keywords

Speech recognitionComputer scienceEmotion detectionPsychologyEmotion recognition

Related Publications

Publication Info

Year
2024
Type
article
Pages
1526-1534
Citations
1659
Access
Closed

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

1659
OpenAlex
0
Influential
31
CrossRef

Cite This

Z. B. M. D. Shah, SHAN Zhiyong, Adnan (2024). Enhancements in Immediate Speech Emotion Detection: Harnessing Prosodic and Spectral Characteristics. International Journal of Innovative Science and Research Technology (IJISRT) , 1526-1534. https://doi.org/10.38124/ijisrt/ijisrt24apr872

Identifiers

DOI
10.38124/ijisrt/ijisrt24apr872

Data Quality

Data completeness: 77%