Authors :
M Ravi; Dr. A Obulesu; CH Vinod Vara Prasad; N Abhishek; N Rithish Reddy; V Anil Chary
Volume/Issue :
Volume 10 - 2025, Issue 4 - April
Google Scholar :
https://tinyurl.com/ykwrhxme
Scribd :
https://tinyurl.com/267fh5p9
DOI :
https://doi.org/10.38124/ijisrt/25apr1252
Note : A published paper may take 4-5 working days from the publication date to appear in PlumX Metrics, Semantic Scholar, and ResearchGate.
Abstract :
Speaker recognition is an essential aspect of human-computer interaction, with applications in security,
personalized services, and more. This project proposes an end-to-end speaker recognition system leveraging Long Short-
Term Memory (LSTM) neural networks. Mel-Frequency Cepstral Coefficients (MFCCs) are used as audio features,
processed by an LSTM model to classify speakers with high accuracy. The proposed system demonstrates the efficacy of
LSTM for temporal feature analysis, achieving robust performance in noisy environments.
Keywords :
Speaker Recognition, Deep Learning, MFCC, LSTM, Audio Classification.
References :
- Yu, D., & Deng, L. Automatic Speech Recognition: A Deep Learning Approach. Springer, 2015.J. Clerk Maxwell, A Treatise on Electricity and Magnetism, 3rd ed., vol. 2. Oxford: Clarendon, 1892, pp.68-73.
- Chollet, F. Deep Learning with Python. Manning Publications, 2018.
- Hochreiter, S., & Schmidhuber, J. Long Short-Term Memory. Neural Computation, 1997.
Speaker recognition is an essential aspect of human-computer interaction, with applications in security,
personalized services, and more. This project proposes an end-to-end speaker recognition system leveraging Long Short-
Term Memory (LSTM) neural networks. Mel-Frequency Cepstral Coefficients (MFCCs) are used as audio features,
processed by an LSTM model to classify speakers with high accuracy. The proposed system demonstrates the efficacy of
LSTM for temporal feature analysis, achieving robust performance in noisy environments.
Keywords :
Speaker Recognition, Deep Learning, MFCC, LSTM, Audio Classification.