Authors :
Vivek Kanji Malam
Volume/Issue :
Volume 8 - 2023, Issue 6 - June
Google Scholar :
https://bit.ly/3TmGbDi
Scribd :
https://tinyurl.com/37tjwaex
DOI :
https://doi.org/10.5281/zenodo.8133465
Abstract :
This research paper addresses the challenge
of transcribing and translating noise-filled audio
recordings that contain a mix of multiple languages and
dialects. The objective is to develop a software-based tool
capable of ingesting low-quality audio files, cleaning the
signals, and creating accurate textual transcripts. The
paper explores the unique difficulties posed by these
recordings, including the presence of slang and local
words not found in standard language models.
Furthermore, the paper discusses the need for context-
dependent translations and the provision of timestamps
for efficient navigation. To overcome these challenges,
the paper proposes the use of OpenAI Whisper Large-
V2, a state-of-the-art machine learning model
specifically designed to handle noise and low signal-to-
noise ratios. Whisper Large-V2's extensive training on a
dataset of 680,000 hours of audio in 100 languages,
including non-ideal and noisy samples, makes it well-
suited for this task. Additionally, its zero-shot learning
capabilities and proficiency in handling multiple
languages ensure reliable and high-quality results. The
research concludes that Whisper Large-V2, with its
balance of accuracy and speed, is the ideal model for
transcribing and translating audio files containing noise
and a mixture of languages and dialects.
Keywords :
Transcribing and translating, Audio recordings, Noise, Multiple languages, Software-based tool, Textual transcript, NLP processing, Deep neural networks, Transcription accuracy, OpenAI Whisper.
This research paper addresses the challenge
of transcribing and translating noise-filled audio
recordings that contain a mix of multiple languages and
dialects. The objective is to develop a software-based tool
capable of ingesting low-quality audio files, cleaning the
signals, and creating accurate textual transcripts. The
paper explores the unique difficulties posed by these
recordings, including the presence of slang and local
words not found in standard language models.
Furthermore, the paper discusses the need for context-
dependent translations and the provision of timestamps
for efficient navigation. To overcome these challenges,
the paper proposes the use of OpenAI Whisper Large-
V2, a state-of-the-art machine learning model
specifically designed to handle noise and low signal-to-
noise ratios. Whisper Large-V2's extensive training on a
dataset of 680,000 hours of audio in 100 languages,
including non-ideal and noisy samples, makes it well-
suited for this task. Additionally, its zero-shot learning
capabilities and proficiency in handling multiple
languages ensure reliable and high-quality results. The
research concludes that Whisper Large-V2, with its
balance of accuracy and speed, is the ideal model for
transcribing and translating audio files containing noise
and a mixture of languages and dialects.
Keywords :
Transcribing and translating, Audio recordings, Noise, Multiple languages, Software-based tool, Textual transcript, NLP processing, Deep neural networks, Transcription accuracy, OpenAI Whisper.