Survey on speech to text modelling for the shona language| International Journal of Innovative Science and Research Technology

Survey on Speech to Text Modelling for the Shona Language

Authors : B Mupini; S Chaputsira; Bk Sibanda

Volume/Issue : Volume 9 - 2024, Issue 1 - January

Google Scholar : http://tinyurl.com/26bz8f4f

Scribd : http://tinyurl.com/3v5dkphd

DOI : https://doi.org/10.5281/zenodo.10609671

Abstract : Conversion of speech to text (STT) for various applications is of huge interest, which involves technological approaches which are innovative that should be applied to accommodate spoken languages in Africa. However, African countries are falling behind on the embracing of STT technologies, with Automatic Speech Recognition (ASR) having been done for popular East African languages. This has always kept transcription at a minimum and has also resulted in a retard in the use of many African languages on a world- wide scale, with another problem being that a single African language may encompass several dialects. This research looks at modern technologies and models that have been implemented to construct ASR and STT models for African languages and existing datasets, with particular interest to the Shona language spoken by the people of Zimbabwe. A survey has been done on STT for the Shona language and it uncovers techniques present which can be used to achieve effective STT for this language. An example of such a technique is accounting for procedures taken to convert spoken words into actual text that can be displayed. The usage of ASR techniques can help in many application areas such as assisting individuals with hearing impairment, transcription services, use in voice commands and control, dictation and notes taking, language learning and translation, customer service and support and also voice search and content indexing. ASR is dominating together with other technologies such as STT conversion, Text to Speech (TTS) conversion and language translation. Cumulatively, these technologies have aided in bridging the gap between people who speak different languages especially tourists and language enthusiasts. In African countries most of which are underdeveloped, many spoken African languages are underrepresented and lowly resourced, which has hampered the advancement of ASR technology on these low resource languages. Bridging this gap will result in African languages, especially Shona, being recognized more in the world and finding use in everyday applications and technologies.

Keywords : Transcribe, Dataset, Models, Dialect, Conversion.

Conversion of speech to text (STT) for various applications is of huge interest, which involves technological approaches which are innovative that should be applied to accommodate spoken languages in Africa. However, African countries are falling behind on the embracing of STT technologies, with Automatic Speech Recognition (ASR) having been done for popular East African languages. This has always kept transcription at a minimum and has also resulted in a retard in the use of many African languages on a world- wide scale, with another problem being that a single African language may encompass several dialects. This research looks at modern technologies and models that have been implemented to construct ASR and STT models for African languages and existing datasets, with particular interest to the Shona language spoken by the people of Zimbabwe. A survey has been done on STT for the Shona language and it uncovers techniques present which can be used to achieve effective STT for this language. An example of such a technique is accounting for procedures taken to convert spoken words into actual text that can be displayed. The usage of ASR techniques can help in many application areas such as assisting individuals with hearing impairment, transcription services, use in voice commands and control, dictation and notes taking, language learning and translation, customer service and support and also voice search and content indexing. ASR is dominating together with other technologies such as STT conversion, Text to Speech (TTS) conversion and language translation. Cumulatively, these technologies have aided in bridging the gap between people who speak different languages especially tourists and language enthusiasts. In African countries most of which are underdeveloped, many spoken African languages are underrepresented and lowly resourced, which has hampered the advancement of ASR technology on these low resource languages. Bridging this gap will result in African languages, especially Shona, being recognized more in the world and finding use in everyday applications and technologies.

Keywords : Transcribe, Dataset, Models, Dialect, Conversion.