Enhancing automatic speech recognitionwith contextual understanding using natural language processing| International Journal of Innovative Science and Research Technology

Enhancing Automatic Speech Recognitionwith Contextual Understanding using Natural Language Processing

Authors : K. Sandhya; P. Nithik Roshan; K. Bala Manikanta; Priyanka Pandarinath

Volume/Issue : Volume 10 - 2025, Issue 5 - May

Google Scholar : https://tinyurl.com/4bt3hhja

DOI : https://doi.org/10.38124/ijisrt/25may1394

PlumX Metrics

Semantic Scholar

ResearchGate

Note : A published paper may take 4-5 working days from the publication date to appear in PlumX Metrics, Semantic Scholar, and ResearchGate.

Abstract : Automatic speech recognition (ASR) has advanced from responding to limited sound to fluently understanding natural language. Used in voice search, virtual assistants, and speech-to-text systems to enhance user experience and productivity. Began with basic sound recognition and evolved into comprehensive language comprehension. Despite significant advancements in automatic speech recognition (ASR) technology, existing systems often struggle to accurately transcribe spoken language in context where semantic nuances and contextual cues play a crucial role. The problem arises from the inherent limitations of conventional ASR approaches to comprehensively understand and intercept the contextual information efficiently, resulting in inaccuracies misinterpretations and errors in transcriptions, especially in scenarios involving ambiguous or context dependent speech. Incorporating Natural Language Processing (NLP) techniques into ASR systems presents a promising avenue to address this challenge.

Keywords : Audio Signal Processing, Feature Extraction, Acoustic Modeling, Language Modeling,Decoding, Post-Processing.

References :

T. Bayes, “An Essay Towards Solving a Problem in the Doctrine of Chances,” Philosophical Transactions of the Royal Society of London, vol. 53, pp. 370–418, 1763.
F. Jelinek, Statistical Methods for Speech Recognition. Cambridge, MA: MIT Press, 1997.
H. A. Bourlard and N. Morgan, Connectionist Speech Recognition: a Hybrid Approach. Norwell, MA: Kluwer Academic Publishers, 1993.
F. Seide, G. Li, and D. Yu, “Conversational Speech Transcription Using Context- Dependent Deep Neural Networks,” in Proc. Interspeech, Florence, Italy, Aug. 2011, pp. 437–440.
V. Fontaine, C. Ris, and H. Leich, “Nonlinear Discriminant Analysis for Improved Speech Recognition,” in Proc. Eurospeech, Rhodes, Greece, Sep. 1997, pp. 1–4.
H. Hermansky, D. Ellis, and S. Sharma, “Tandem connectionist Feature Extraction for Conventional HMM Systems,” in Proc. IEEE ICASSP, vol. 3, Istanbul, Turkey, Jun. 2000, pp. 1635–1638.
M. Nakamura and K. Shikano, “A Study of English Word Category Prediction Based on Neural Networks,” in Proc. IEEE ICASSP, Glasglow, UK, May 1989, pp. 731–734. .
Y. Bengio, R. Ducharme, and P. Vincent, “A Neural Probabilistic Language Model,” in Proc. NIPS, vol. 13, Denver, CO, Nov. 2000, pp. 932–938.
H. Schwenk and J.-L. Gauvain, “Connectionist Language Modeling for Large Vocabulary Continuous Speech Recognition,” in Proc. IEEE ICASSP, Orlando, FL, May 2002, pp. 765–768.
Z. Tuske, P. Golik, R. Schl ¨ uter, and H. Ney, “Acoustic Modeling with ¨ Deep Neural Networks Using Raw Time Signal for LVCSR,” in Proc. Interspeech, Singapore Sep. 2014, pp. 890–894.
T. N. Sainath, R. J. Weiss, K. W. Wilson, A. Narayanan, M. Bacchiani, and A.
Raw Multichannel Waveforms,” in Proc. IEEE ASRU, Scottsdale, AZ, Dec. 2015, pp. 30–36.
A. Graves, S. Fernandez, F. Gomez, and J. Schmidhuber, “Connection- ´ ist temporal classification: labelling unsegmented sequence data with recurrent neural networks,” in Proc. ICML, Pittsburgh, PA, Jun. 2006, pp. 369–376.
A. Graves, “Sequence Transduction with Recurrent Neural Networks,” Nov. 2012,
arXiv:1211.3711. [Online]. Available: https://arxiv.org/abs/ 1211.3711
J. K. Chorowski, D. Bahdanau, D. Serdyuk, K. Cho, and Y. Bengio, “Attention- Based Models for Speech Recognition,” in Proc. NIPS, vol. 28, Laval, Queebec, Canada, Dec. 2015, pp. 577–585. `
www.huggingface.com
www.openai.com
www.cloud.google.com/speech-to-text
www.aws.amazon.com/transcribe
"Speech and Language Processing" by Daniel Jurafsky and James H. Martin.
"Deep Learning for Natural Language Processing" by Palash Goyal, Sumit Pandey, Karan Jain.
"Neural Networks for Natural Language Processing" by Yoav Goldberg
"Natural Language Processing with Transformers" by Lewis Tunstall, Leandro von Werra, and Thomas Wolf
""Automatic Speech Recognition: A Deep Learning Approach" by Dong Yu.
"Speech and Language Processing" by Daniel Jurafsky and James H. Martin.
"Deep Learning for Natural Language Processing" by Palash Goyal, Sumit Pandey,Karan Jain.
"Neural Networks for Natural Language Processing" by Yoav Goldberg
"Natural Language Processing with Transformers" by Lewis Tunstall, Leandro von Werra, and Thomas Wolf
"Automatic Speech Recognition: A Deep Learning Approach" by Dong Yu”.
"Natural Language Understanding" by James Allen

Automatic speech recognition (ASR) has advanced from responding to limited sound to fluently understanding natural language. Used in voice search, virtual assistants, and speech-to-text systems to enhance user experience and productivity. Began with basic sound recognition and evolved into comprehensive language comprehension. Despite significant advancements in automatic speech recognition (ASR) technology, existing systems often struggle to accurately transcribe spoken language in context where semantic nuances and contextual cues play a crucial role. The problem arises from the inherent limitations of conventional ASR approaches to comprehensively understand and intercept the contextual information efficiently, resulting in inaccuracies misinterpretations and errors in transcriptions, especially in scenarios involving ambiguous or context dependent speech. Incorporating Natural Language Processing (NLP) techniques into ASR systems presents a promising avenue to address this challenge.

Keywords : Audio Signal Processing, Feature Extraction, Acoustic Modeling, Language Modeling,Decoding, Post-Processing.

CALL FOR PAPERS

Paper Submission Last Date
30 - June - 2025

Paper Review Notification
In 2-3 Days

Paper Publishing
In 2-3 Days

Video Explanation for Published paper

CALL FOR PAPERS

Never miss an update from Papermashup

Get notified about the latest tutorials and downloads.

Subscribe by Email

Get alerts directly into your inbox after each post and stay updated.

Subscribe by RSS

Add our RSS to your feedreader to get regular updates from us.