Harnessing Open Innovation for Translating Global Languages into Indian Lanuages


Authors : Y V Nagesh Meesala; V Sai Surya; P Sai Kiran; R Sree Vardhan

Volume/Issue : Volume 9 - 2024, Issue 4 - April

Google Scholar : https://tinyurl.com/47jbk744

Scribd : https://tinyurl.com/3puss745

DOI : https://doi.org/10.38124/ijisrt/IJISRT24APR1907

Abstract : Understanding foreign languages can be challenging for individuals living in India's diverse linguistic landscapes. We propose a new technology that utilizes machine translation to address this issue, specifically focusing on speech recognition and synthesis. It aims to convert online video resources into Indian languages by integrating open-source technologies like text-to-speech (TTS), speech-to-text (STT) systems, and FFmpeg library to separate or augment audio and video. We used the whisper model, the application that can read up to 60 different Languages in the form of audio as input, and it transcripts the audio into text with segments of sentences based on timestamps. The sentence-based transcription generated by whisper is then translated into the desired language using Google Cloud translate_v2. Later, Each timestamp was individually converted into audio using the Google Cloud text-to-speech service, ensuring the audio fits inside the length of its respective timestamp. The individual audio segments are then augmented to generate the final audio in the desired language. Finally, the audio is attached to the original video, ensuring video-audio synchronization. The accuracy of the translation was verified by comparing the naturalness of the audio with general spoken language standards. This application benefits visually impaired individuals and those who cannot read text, providing them with a means to acquire knowledge in their native languages.

Keywords : Open Innovation, Text-to-Speech Speech-to-Text, Machine Translation.

Understanding foreign languages can be challenging for individuals living in India's diverse linguistic landscapes. We propose a new technology that utilizes machine translation to address this issue, specifically focusing on speech recognition and synthesis. It aims to convert online video resources into Indian languages by integrating open-source technologies like text-to-speech (TTS), speech-to-text (STT) systems, and FFmpeg library to separate or augment audio and video. We used the whisper model, the application that can read up to 60 different Languages in the form of audio as input, and it transcripts the audio into text with segments of sentences based on timestamps. The sentence-based transcription generated by whisper is then translated into the desired language using Google Cloud translate_v2. Later, Each timestamp was individually converted into audio using the Google Cloud text-to-speech service, ensuring the audio fits inside the length of its respective timestamp. The individual audio segments are then augmented to generate the final audio in the desired language. Finally, the audio is attached to the original video, ensuring video-audio synchronization. The accuracy of the translation was verified by comparing the naturalness of the audio with general spoken language standards. This application benefits visually impaired individuals and those who cannot read text, providing them with a means to acquire knowledge in their native languages.

Keywords : Open Innovation, Text-to-Speech Speech-to-Text, Machine Translation.

Never miss an update from Papermashup

Get notified about the latest tutorials and downloads.

Subscribe by Email

Get alerts directly into your inbox after each post and stay updated.
Subscribe
OR

Subscribe by RSS

Add our RSS to your feedreader to get regular updates from us.
Subscribe