Authors :
Yoheswari S.; Jiyaudeen N.; Praveen Kumar A.; Ram Kumar P.
Volume/Issue :
Volume 11 - 2026, Issue 4 - April
Google Scholar :
https://tinyurl.com/2zj2kkrt
Scribd :
https://tinyurl.com/bzx3dp5z
DOI :
https://doi.org/10.38124/ijisrt/26apr193
Note : A published paper may take 4-5 working days from the publication date to appear in PlumX Metrics, Semantic Scholar, and ResearchGate.
Abstract :
Communication barriers between individuals with auditory or speech impairments and the general population
present significant obstacles in daily interactions, education, healthcare, and employment. Currently, there exists a vast
linguistic gap between those who speak using vocal languages and those who communicate primarily through sign
language. To bridge this critical divide, this comprehensive study presents a real-time, two-way Sign Language Translator
system built utilizing modern computer vision, deep learning architectures, and a web-based framework. The proposed
solution facilitates bidirectional communication through two core pillars: An Audio-to-Sign module, which accurately
transcribes spoken language into text and maps it into corresponding Indian Sign Language (ISL) animations, and a Signto-Audio module, which dynamically recognizes physical hand gestures and translates them into synthesized spoken
English. The system leverages the MediaPipe Hands framework for rapid and robust sub-millimeter hand landmark
extraction, augmented by a customized MobileNet Convolutional Neural Network (CNN) architecture for localized gesture
classification. Furthermore, the logic is enveloped in a robust Django backend, ensuring stateful session management,
database-backed user profiles, and seamless usability. The results indicate high accuracy in varied background conditions,
maintaining an architecture lightweight enough for immediate real-time response.
Keywords :
Sign Language Recognition (SLR), Deep Learning, MobileNet, MediaPipe Hands, Speech Recognition, Indian Sign Language (ISL), Accessibility Technology, Convolutional Neural Networks, Django.
References :
- C. Lugaresi et al., "MediaPipe: A Framework for Building Perception Pipelines," in Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Jun. 2019, pp. 1–9.
- A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, and W. Wang, "MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications," arXiv preprint arXiv:1704.04861, Apr. 2017, DOI: 10.48550/arXiv.1704.04861.
- K. B. M. Kumar and M. V. R. Rao, "Indian Sign Language Recognition System using CNN and Image Processing," International Journal of Engineering Research & Technology (IJERT), vol. 9, no. 6, pp. 45–52, Jun. 2020.
- S. K. V., A. Sharma, and T. R. S., "A Survey of Real-Time Hand Gesture Recognition and Sign Language Translation Techniques," IEEE Access, vol. 11, pp. 6421–6443, Jan. 2023, DOI: 10.1109/ACCESS.2023.3237798.
- P. Agarwal and S. R. N. Reddy, "Bidirectional Sign Language Translation System using Text-to-Speech and Visual Mapping," Procedia Computer Science, vol. 165, pp. 323–333, Sep. 2020, DOI: 10.1016/j.procs.2020.01.074.
- M. A. Asghar, M. Khan, and S. Ahmad, "Deep Learning-based Real-Time Indian Sign Language Recognition System," in Proc. Int. Conf. on Machine Learning and Cybernetics (ICMLC), 2021, pp. 1245–1254.
- F. Chollet, "Xception: Deep Learning with Depthwise Separable Convolutions," in Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 1251–1258.
- R. Basnet, A. H. Sung, and Q. Liu, "Learning to Detect Hand Gestures using Spatial Coordinate Normalization," International Journal of Research in Engineering and Technology, vol. 3, no. 6, pp. 11–24, 2019.
- O. K. Sahingoz, E. Buber, O. Demir, and B. Diri, "Machine Learning Based Gesture Detection from Webcams," Expert Systems with Applications, vol. 117, pp. 345–357, 2019.
- R. S. Rao and A. R. Pais, "Detection of Dynamic Hand Gestures Using an Efficient Feature-Based Machine Learning Framework," Neural Computing and Applications, vol. 31, no. 8, pp. 3851–3873, 2019.
- R. Verma and A. Das, "Robust Speech-to-Sign Translation: Fast Feature Extraction and Keyword Mapping," in Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), 2020, pp. 1–6.
- A. C. Bahnsen, E. C. Bohorquez, S. Villegas, J. Vargas, and F. A. Gonzalez, "Classifying Sign Language Formations Using Recurrent Neural Networks," in IEEE Conf. Intelligence and Security Informatics, 2021, pp. 1–6.
- S. Marchal, K. Saari, N. Singh, and N. Asokan, "Bridging the Gap: Novel Techniques for Bidirectional Translation and Human-Computer Interaction," in IEEE Int. Conf. Distributed Computing Systems, 2020, pp. 323–333.
- R. M. Mohammad, F. Thabtah, and L. McCluskey, "Predicting Visual Gestures Based on Self-Structuring Neural Networks," Neural Computing and Applications, vol. 25, no. 2, pp. 443–458, 2018.
- Y. Zhang, J. I. Hong, and L. F. Cranor, "Voice-to-Text: A Content-Based Approach to Animating Virtual Signs," in Proc. WWW Conf., 2019, pp. 639–648.
Communication barriers between individuals with auditory or speech impairments and the general population
present significant obstacles in daily interactions, education, healthcare, and employment. Currently, there exists a vast
linguistic gap between those who speak using vocal languages and those who communicate primarily through sign
language. To bridge this critical divide, this comprehensive study presents a real-time, two-way Sign Language Translator
system built utilizing modern computer vision, deep learning architectures, and a web-based framework. The proposed
solution facilitates bidirectional communication through two core pillars: An Audio-to-Sign module, which accurately
transcribes spoken language into text and maps it into corresponding Indian Sign Language (ISL) animations, and a Signto-Audio module, which dynamically recognizes physical hand gestures and translates them into synthesized spoken
English. The system leverages the MediaPipe Hands framework for rapid and robust sub-millimeter hand landmark
extraction, augmented by a customized MobileNet Convolutional Neural Network (CNN) architecture for localized gesture
classification. Furthermore, the logic is enveloped in a robust Django backend, ensuring stateful session management,
database-backed user profiles, and seamless usability. The results indicate high accuracy in varied background conditions,
maintaining an architecture lightweight enough for immediate real-time response.
Keywords :
Sign Language Recognition (SLR), Deep Learning, MobileNet, MediaPipe Hands, Speech Recognition, Indian Sign Language (ISL), Accessibility Technology, Convolutional Neural Networks, Django.