Speech emotion recognition using deep learning| International Journal of Innovative Science and Research Technology

Speech Emotion Recognition using Deep Learning

Authors : Akash Raghav; Dr. C. Lakshmi

Volume/Issue : Volume 7 - 2022, Issue 9 - September

Google Scholar : https://bit.ly/3IIfn9N

DOI : https://doi.org/10.5281/zenodo.7215574

Abstract : The goal of the project is to detect the speaker's emotions while he or she speaks. Speech generated under a condition of fear, rage or delight, for example, becomes very loud and fast, with a larger and more varied pitch range, However, in a moment of grief or tiredness, speech is slow and low-pitched. Voice and speech patterns can be used to detect human emotions, which can help improve human-machine interactions. We give Deep Neural Networks CNN, Support Vector Machine, and MLP Classification based on auditory data for emotion produced by speech, such as Mel Frequency Cepstral Coefficient classification model (MFCC).Eight different emotions have been taught to the model (neutral, calm, happy, sad, angry, fearful, disgust, surprise), Using the RAVDESS (Ryerson AudioVisual Database of Emotional Speech and Song) dataset as well as the TESS (Toronto Emotional Speech Set) dataset, we found that the proposed approach achieves accuracies of 86 percent, 84 percent, and 82 percent, respectively, for eight emotions using CNN, MLP Classifier, and SVM Classifiers.

The goal of the project is to detect the speaker's emotions while he or she speaks. Speech generated under a condition of fear, rage or delight, for example, becomes very loud and fast, with a larger and more varied pitch range, However, in a moment of grief or tiredness, speech is slow and low-pitched. Voice and speech patterns can be used to detect human emotions, which can help improve human-machine interactions. We give Deep Neural Networks CNN, Support Vector Machine, and MLP Classification based on auditory data for emotion produced by speech, such as Mel Frequency Cepstral Coefficient classification model (MFCC).Eight different emotions have been taught to the model (neutral, calm, happy, sad, angry, fearful, disgust, surprise), Using the RAVDESS (Ryerson AudioVisual Database of Emotional Speech and Song) dataset as well as the TESS (Toronto Emotional Speech Set) dataset, we found that the proposed approach achieves accuracies of 86 percent, 84 percent, and 82 percent, respectively, for eight emotions using CNN, MLP Classifier, and SVM Classifiers.

CALL FOR PAPERS

Paper Submission Last Date
31 - July - 2025

Video Explanation for Published paper

CALL FOR PAPERS

Never miss an update from Papermashup

Get notified about the latest tutorials and downloads.

Subscribe by Email

Get alerts directly into your inbox after each post and stay updated.

Subscribe by RSS

Add our RSS to your feedreader to get regular updates from us.