Realtime sign language and audio conversion with ai| International Journal of Innovative Science and Research Technology

Real-Time Sign Language and Audio Conversion with AI

Authors : Yoheswari S.; Jiyaudeen N.; Praveen Kumar A.; Ram Kumar P.

Volume/Issue : Volume 11 - 2026, Issue 4 - April

Google Scholar : https://tinyurl.com/2zj2kkrt

Scribd : https://tinyurl.com/bzx3dp5z

DOI : https://doi.org/10.38124/ijisrt/26apr193

PlumX Metrics

Semantic Scholar

ResearchGate

Note : A published paper may take 4-5 working days from the publication date to appear in PlumX Metrics, Semantic Scholar, and ResearchGate.

Abstract : Communication barriers between individuals with auditory or speech impairments and the general population present significant obstacles in daily interactions, education, healthcare, and employment. Currently, there exists a vast linguistic gap between those who speak using vocal languages and those who communicate primarily through sign language. To bridge this critical divide, this comprehensive study presents a real-time, two-way Sign Language Translator system built utilizing modern computer vision, deep learning architectures, and a web-based framework. The proposed solution facilitates bidirectional communication through two core pillars: An Audio-to-Sign module, which accurately transcribes spoken language into text and maps it into corresponding Indian Sign Language (ISL) animations, and a Signto-Audio module, which dynamically recognizes physical hand gestures and translates them into synthesized spoken English. The system leverages the MediaPipe Hands framework for rapid and robust sub-millimeter hand landmark extraction, augmented by a customized MobileNet Convolutional Neural Network (CNN) architecture for localized gesture classification. Furthermore, the logic is enveloped in a robust Django backend, ensuring stateful session management, database-backed user profiles, and seamless usability. The results indicate high accuracy in varied background conditions, maintaining an architecture lightweight enough for immediate real-time response.

Keywords : Sign Language Recognition (SLR), Deep Learning, MobileNet, MediaPipe Hands, Speech Recognition, Indian Sign Language (ISL), Accessibility Technology, Convolutional Neural Networks, Django.

References :

C. Lugaresi et al., "MediaPipe: A Framework for Building Perception Pipelines," in Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Jun. 2019, pp. 1–9.
A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, and W. Wang, "MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications," arXiv preprint arXiv:1704.04861, Apr. 2017, DOI: 10.48550/arXiv.1704.04861.
K. B. M. Kumar and M. V. R. Rao, "Indian Sign Language Recognition System using CNN and Image Processing," International Journal of Engineering Research & Technology (IJERT), vol. 9, no. 6, pp. 45–52, Jun. 2020.
S. K. V., A. Sharma, and T. R. S., "A Survey of Real-Time Hand Gesture Recognition and Sign Language Translation Techniques," IEEE Access, vol. 11, pp. 6421–6443, Jan. 2023, DOI: 10.1109/ACCESS.2023.3237798.
P. Agarwal and S. R. N. Reddy, "Bidirectional Sign Language Translation System using Text-to-Speech and Visual Mapping," Procedia Computer Science, vol. 165, pp. 323–333, Sep. 2020, DOI: 10.1016/j.procs.2020.01.074.
M. A. Asghar, M. Khan, and S. Ahmad, "Deep Learning-based Real-Time Indian Sign Language Recognition System," in Proc. Int. Conf. on Machine Learning and Cybernetics (ICMLC), 2021, pp. 1245–1254.
F. Chollet, "Xception: Deep Learning with Depthwise Separable Convolutions," in Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 1251–1258.
R. Basnet, A. H. Sung, and Q. Liu, "Learning to Detect Hand Gestures using Spatial Coordinate Normalization," International Journal of Research in Engineering and Technology, vol. 3, no. 6, pp. 11–24, 2019.
O. K. Sahingoz, E. Buber, O. Demir, and B. Diri, "Machine Learning Based Gesture Detection from Webcams," Expert Systems with Applications, vol. 117, pp. 345–357, 2019.
R. S. Rao and A. R. Pais, "Detection of Dynamic Hand Gestures Using an Efficient Feature-Based Machine Learning Framework," Neural Computing and Applications, vol. 31, no. 8, pp. 3851–3873, 2019.
R. Verma and A. Das, "Robust Speech-to-Sign Translation: Fast Feature Extraction and Keyword Mapping," in Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), 2020, pp. 1–6.
A. C. Bahnsen, E. C. Bohorquez, S. Villegas, J. Vargas, and F. A. Gonzalez, "Classifying Sign Language Formations Using Recurrent Neural Networks," in IEEE Conf. Intelligence and Security Informatics, 2021, pp. 1–6.
S. Marchal, K. Saari, N. Singh, and N. Asokan, "Bridging the Gap: Novel Techniques for Bidirectional Translation and Human-Computer Interaction," in IEEE Int. Conf. Distributed Computing Systems, 2020, pp. 323–333.
R. M. Mohammad, F. Thabtah, and L. McCluskey, "Predicting Visual Gestures Based on Self-Structuring Neural Networks," Neural Computing and Applications, vol. 25, no. 2, pp. 443–458, 2018.
Y. Zhang, J. I. Hong, and L. F. Cranor, "Voice-to-Text: A Content-Based Approach to Animating Virtual Signs," in Proc. WWW Conf., 2019, pp. 639–648.

Communication barriers between individuals with auditory or speech impairments and the general population present significant obstacles in daily interactions, education, healthcare, and employment. Currently, there exists a vast linguistic gap between those who speak using vocal languages and those who communicate primarily through sign language. To bridge this critical divide, this comprehensive study presents a real-time, two-way Sign Language Translator system built utilizing modern computer vision, deep learning architectures, and a web-based framework. The proposed solution facilitates bidirectional communication through two core pillars: An Audio-to-Sign module, which accurately transcribes spoken language into text and maps it into corresponding Indian Sign Language (ISL) animations, and a Signto-Audio module, which dynamically recognizes physical hand gestures and translates them into synthesized spoken English. The system leverages the MediaPipe Hands framework for rapid and robust sub-millimeter hand landmark extraction, augmented by a customized MobileNet Convolutional Neural Network (CNN) architecture for localized gesture classification. Furthermore, the logic is enveloped in a robust Django backend, ensuring stateful session management, database-backed user profiles, and seamless usability. The results indicate high accuracy in varied background conditions, maintaining an architecture lightweight enough for immediate real-time response.

Paper Submission Last Date
30 - June - 2026

SUBMIT YOUR PAPER CALL FOR PAPERS

Video Explanation for Published paper

Never miss an update from Papermashup

Get notified about the latest tutorials and downloads.

Subscribe by Email

Get alerts directly into your inbox after each post and stay updated.

Subscribe by RSS

Add our RSS to your feedreader to get regular updates from us.