Image captionbot for assistive technology | International Journal of Innovative Science and Research Technology

Image Captionbot for Assistive Technology

Authors : Arnold Abraham; Aby Alias; Vishnumaya

Volume/Issue : Volume 7 - 2022, Issue 2 - February

Google Scholar : http://bitly.ws/gu88

DOI : https://doi.org/10.5281/zenodo.6341477

Abstract : Because an image can have a variety of meanings in different languages, it's difficult to generate short descriptions of those meanings automatically. It's difficult to extract context from images and use it to construct sentences because they contain so many different types of information. It allows blind people to independently explore their surroundings. Deep learning, a new programming trend, can be used to create this type of system. This project will use VGG16, a top-notch CNN architecture for image classification and feature extraction. In the text description process, LSTM and an embedding layer will be used. These two networks will be combined to form an image caption generation network. After that, we'll train our model with data from the flickr8k dataset. The model's output is converted to audio for the benefit of those who are visually impaired

Keywords : Deep Learning; Recurrent neural network; Convolutional neural network; VGG16; LSTM.

Because an image can have a variety of meanings in different languages, it's difficult to generate short descriptions of those meanings automatically. It's difficult to extract context from images and use it to construct sentences because they contain so many different types of information. It allows blind people to independently explore their surroundings. Deep learning, a new programming trend, can be used to create this type of system. This project will use VGG16, a top-notch CNN architecture for image classification and feature extraction. In the text description process, LSTM and an embedding layer will be used. These two networks will be combined to form an image caption generation network. After that, we'll train our model with data from the flickr8k dataset. The model's output is converted to audio for the benefit of those who are visually impaired

Keywords : Deep Learning; Recurrent neural network; Convolutional neural network; VGG16; LSTM.

Paper Submission Last Date
28 - February - 2026

SUBMIT YOUR PAPER CALL FOR PAPERS

Video Explanation for Published paper

Never miss an update from Papermashup

Get notified about the latest tutorials and downloads.

Subscribe by Email

Get alerts directly into your inbox after each post and stay updated.

Subscribe by RSS

Add our RSS to your feedreader to get regular updates from us.