Natural language processingbased solution for accurate transcription and translation of distorted multilingual audio signals| International Journal of Innovative Science and Research Technology

Natural Language Processing-based Solution for Accurate Transcription and Translation of Distorted Multilingual Audio Signals

Authors : Vivek Kanji Malam

Volume/Issue : Volume 8 - 2023, Issue 6 - June

Google Scholar : https://bit.ly/3TmGbDi

Scribd : https://tinyurl.com/37tjwaex

DOI : https://doi.org/10.5281/zenodo.8133465

Abstract : This research paper addresses the challenge of transcribing and translating noise-filled audio recordings that contain a mix of multiple languages and dialects. The objective is to develop a software-based tool capable of ingesting low-quality audio files, cleaning the signals, and creating accurate textual transcripts. The paper explores the unique difficulties posed by these recordings, including the presence of slang and local words not found in standard language models. Furthermore, the paper discusses the need for context- dependent translations and the provision of timestamps for efficient navigation. To overcome these challenges, the paper proposes the use of OpenAI Whisper Large- V2, a state-of-the-art machine learning model specifically designed to handle noise and low signal-to- noise ratios. Whisper Large-V2's extensive training on a dataset of 680,000 hours of audio in 100 languages, including non-ideal and noisy samples, makes it well- suited for this task. Additionally, its zero-shot learning capabilities and proficiency in handling multiple languages ensure reliable and high-quality results. The research concludes that Whisper Large-V2, with its balance of accuracy and speed, is the ideal model for transcribing and translating audio files containing noise and a mixture of languages and dialects.

Keywords : Transcribing and translating, Audio recordings, Noise, Multiple languages, Software-based tool, Textual transcript, NLP processing, Deep neural networks, Transcription accuracy, OpenAI Whisper.

This research paper addresses the challenge of transcribing and translating noise-filled audio recordings that contain a mix of multiple languages and dialects. The objective is to develop a software-based tool capable of ingesting low-quality audio files, cleaning the signals, and creating accurate textual transcripts. The paper explores the unique difficulties posed by these recordings, including the presence of slang and local words not found in standard language models. Furthermore, the paper discusses the need for context- dependent translations and the provision of timestamps for efficient navigation. To overcome these challenges, the paper proposes the use of OpenAI Whisper Large- V2, a state-of-the-art machine learning model specifically designed to handle noise and low signal-to- noise ratios. Whisper Large-V2's extensive training on a dataset of 680,000 hours of audio in 100 languages, including non-ideal and noisy samples, makes it well- suited for this task. Additionally, its zero-shot learning capabilities and proficiency in handling multiple languages ensure reliable and high-quality results. The research concludes that Whisper Large-V2, with its balance of accuracy and speed, is the ideal model for transcribing and translating audio files containing noise and a mixture of languages and dialects.

Paper Submission Last Date
31 - July - 2026

SUBMIT YOUR PAPER CALL FOR PAPERS

Video Explanation for Published paper

Never miss an update from Papermashup

Get notified about the latest tutorials and downloads.

Subscribe by Email

Get alerts directly into your inbox after each post and stay updated.

Subscribe by RSS

Add our RSS to your feedreader to get regular updates from us.