Authors :
Akash Raghav; Dr. C. Lakshmi
Volume/Issue :
Volume 7 - 2022, Issue 9 - September
Google Scholar :
https://bit.ly/3IIfn9N
Scribd :
https://bit.ly/3EPIbio
DOI :
https://doi.org/10.5281/zenodo.7215574
Abstract :
The goal of the project is to detect the
speaker's emotions while he or she speaks. Speech
generated under a condition of fear, rage or delight, for
example, becomes very loud and fast, with a larger and
more varied pitch range, However, in a moment of grief
or tiredness, speech is slow and low-pitched. Voice and
speech patterns can be used to detect human emotions,
which can help improve human-machine interactions.
We give Deep Neural Networks CNN, Support Vector
Machine, and MLP Classification based on auditory
data for emotion produced by speech, such as Mel
Frequency Cepstral Coefficient classification model
(MFCC).Eight different emotions have been taught to
the model (neutral, calm, happy, sad, angry, fearful,
disgust, surprise), Using the RAVDESS (Ryerson AudioVisual Database of Emotional Speech and Song) dataset
as well as the TESS (Toronto Emotional Speech Set)
dataset, we found that the proposed approach achieves
accuracies of 86 percent, 84 percent, and 82 percent,
respectively, for eight emotions using CNN, MLP
Classifier, and SVM Classifiers.
The goal of the project is to detect the
speaker's emotions while he or she speaks. Speech
generated under a condition of fear, rage or delight, for
example, becomes very loud and fast, with a larger and
more varied pitch range, However, in a moment of grief
or tiredness, speech is slow and low-pitched. Voice and
speech patterns can be used to detect human emotions,
which can help improve human-machine interactions.
We give Deep Neural Networks CNN, Support Vector
Machine, and MLP Classification based on auditory
data for emotion produced by speech, such as Mel
Frequency Cepstral Coefficient classification model
(MFCC).Eight different emotions have been taught to
the model (neutral, calm, happy, sad, angry, fearful,
disgust, surprise), Using the RAVDESS (Ryerson AudioVisual Database of Emotional Speech and Song) dataset
as well as the TESS (Toronto Emotional Speech Set)
dataset, we found that the proposed approach achieves
accuracies of 86 percent, 84 percent, and 82 percent,
respectively, for eight emotions using CNN, MLP
Classifier, and SVM Classifiers.