Image Captionbot for Assistive Technology

Authors : Arnold Abraham; Aby Alias; Vishnumaya

Volume/Issue : Volume 7 - 2022, Issue 2 - February

Google Scholar :

Scribd :


Because an image can have a variety of meanings in different languages, it's difficult to generate short descriptions of those meanings automatically. It's difficult to extract context from images and use it to construct sentences because they contain so many different types of information. It allows blind people to independently explore their surroundings. Deep learning, a new programming trend, can be used to create this type of system. This project will use VGG16, a top-notch CNN architecture for image classification and feature extraction. In the text description process, LSTM and an embedding layer will be used. These two networks will be combined to form an image caption generation network. After that, we'll train our model with data from the flickr8k dataset. The model's output is converted to audio for the benefit of those who are visually impaired

Keywords : Deep Learning; Recurrent neural network; Convolutional neural network; VGG16; LSTM.


Paper Submission Last Date
30 - April - 2024

Paper Review Notification
In 1-2 Days

Paper Publishing
In 2-3 Days

Video Explanation for Published paper

Never miss an update from Papermashup

Get notified about the latest tutorials and downloads.

Subscribe by Email

Get alerts directly into your inbox after each post and stay updated.

Subscribe by RSS

Add our RSS to your feedreader to get regular updates from us.