Abstract
Speech is essential to human communication for expressing and understanding feelings. Emotional speech processing has challenges with expert data sampling, dataset organization, and computational complexity in large-scale analysis. This study aims to reduce data redundancy and high dimensionality by introducing a new speech emotion recognition system. The system employs Diffusion Map to reduce dimensionality and includes Decision Trees and K-Nearest Neighbors(KNN)ensemble classifiers. These strategies are suggested to increase voice emotion recognition accuracy. Speech emotion recognition is gaining popularity in affective computing for usage in medical, industry, and academics. This project aims to provide an efficient and robust real-time emotion identification framework. In order to identify emotions using supervised machine learning models, this work makes use of paralinguistic factors such as intensity, pitch, and MFCC. In order to classify data, experimental analysis integrates prosodic and spectral information utilizing methods like Random Forest, Multilayer Perceptron, SVM, KNN, and Gaussian Naïve Bayes. Fast training times make these machine learning models excellent for real-time applications. SVM and MLP have the highest accuracy at 70.86% and 79.52%, respectively. Comparisons to benchmarks show significant improvements over earlier models.
Keywords
Related Publications
Exploratory analysis and visualization of speech and music by locally linear embedding
Many problems in voice recognition and audio processing involve feature extraction from raw waveforms. The goal of feature extraction is to reduce the dimensionality of the audi...
Using Hashtags to Capture Fine Emotion Categories from Tweets
Detecting emotions in microblogs and social media posts has applications for industry, health, and security. Statistical, supervised automatic methods for emotion detection rely...
Boosting attribute and phone estimation accuracies with deep neural networks for detection-based speech recognition
Generation of high-precision sub-phonetic attribute (also known as phonological features) and phone lattices is a key frontend component for detection-based bottom-up speech rec...
EEG Emotion Recognition Using Dynamical Graph Convolutional Neural Networks
In this paper, a multichannel EEG emotion recognition method based on a novel dynamical graph convolutional neural networks (DGCNN) is proposed. The basic idea of the proposed E...
PIXOR: Real-time 3D Object Detection from Point Clouds
We address the problem of real-time 3D object detection from point clouds in the context of autonomous driving. Speed is critical as detection is a necessary component for safet...
Publication Info
- Year
- 2024
- Type
- article
- Pages
- 1526-1534
- Citations
- 1659
- Access
- Closed
External Links
Social Impact
Social media, news, blog, policy document mentions
Citation Metrics
Cite This
Identifiers
- DOI
- 10.38124/ijisrt/ijisrt24apr872