Image caption generator using cnn and lstm| International Journal of Innovative Science and Research Technology

Image Caption Generator Using CNN and LSTM

Authors : Monali Kapuriya; Zemi Lakkad; Satwi Shah

Volume/Issue : Volume 9 - 2024, Issue 8 - August

Google Scholar : https://tinyurl.com/2r682n3n

Scribd : https://tinyurl.com/msjaassb

DOI : https://doi.org/10.38124/ijisrt/IJISRT24AUG851

PlumX Metrics

Semantic Scholar

ResearchGate

Note : A published paper may take 4-5 working days from the publication date to appear in PlumX Metrics, Semantic Scholar, and ResearchGate.

Abstract : In this have a look at, we discover the integration of Convolutional Neural Networks (CNNs) and Long Short-Term Memory (LSTM) networks for the motive of image caption generation, a mission that involves a fusion of herbal language processing and computer imaginative and prescient techniques to describe images in English. Delving into the realm of photograph captioning, we meticulously investigate several fundamental concepts and methodologies associated with this area. Our technique includes leveraging prominent equipment inclusive of the Keras library, numpy, and Jupyter notebooks to facilitate the development of our studies. Furthermore, we delve into the utilization of the flickr_dataset and CNNs for image category, elucidating their significance in our examination. Through this research endeavor, we aim to make a contribution to the development of image captioning structures with the aid of combining modern-day strategies from both laptop imaginative and prescient and herbal language processing domain names.

Keywords : CNN, LSTM, Image Captioning, Deep Learning.

References :

Abhaya Agarwal and Alon Lavie. 2008. Meteor, m-bleu and m-ter: Evaluation metrics for high-correlation with human rankings of machine translation output. In Proceedings of the ThirdWorkshop on Statistical Machine Translation. Association for Computational Linguistics, 115–118.
Ahmet Aker and Robert Gaizauskas. 2010. Generating image descriptions using dependency relational patterns. In Proceedings of the 48th annual meeting of the association for computational linguistics. Association for Computational Linguistics, 1250–1258.
Peter Anderson, Basura Fernando, Mark Johnson, and Stephen Gould. 2016. Spice: Semantic propositional image caption evaluation. In European Conference on Computer Vision. Springer, 382–398.
Peter Anderson, Xiaodong He, Chris Buehler, Damien Teney, Mark Johnson, Stephen Gould, and Lei Zhang. 2017. Bottom-up and top-down attention for image captioning and vqa. arXiv preprint arXiv:1707.07998 (2017).
Jyoti Aneja, Aditya Deshpande, and Alexander G Schwing. 2018. Convolutional image captioning.In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5561–5570.
Lisa Anne Hendricks, Subhashini Venugopalan, Marcus Rohrbach, Raymond Mooney, Kate Saenko, Trevor Darrell, Junhua Mao, Jonathan Huang, Alexander Toshev, Oana Camburu, et al. 2016. Deep compositional captioning: Describing novel object categories without paired training data. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.
Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2015. Neural machine translation by jointly learning to align and translate. In International Conference on Learning Representations (ICLR).
Shuang Bai and Shan An. 2018. A Survey on Automatic Image Caption Generation. Neurocomputing. ACM Computing Surveys, Vol. 0, No. 0, Article 0. Acceptance Date: October 2018. 0:30 Hossain et al.
Satanjeev Banerjee and Alon Lavie. 2005. METEOR: An automatic metric for MT evaluation with improved correlation with human judgments. In Proceedings of the acl workshop on intrinsic and extrinsic evaluation measures for machine translation and/or summarization, Vol. 29. 65–72.

In this have a look at, we discover the integration of Convolutional Neural Networks (CNNs) and Long Short-Term Memory (LSTM) networks for the motive of image caption generation, a mission that involves a fusion of herbal language processing and computer imaginative and prescient techniques to describe images in English. Delving into the realm of photograph captioning, we meticulously investigate several fundamental concepts and methodologies associated with this area. Our technique includes leveraging prominent equipment inclusive of the Keras library, numpy, and Jupyter notebooks to facilitate the development of our studies. Furthermore, we delve into the utilization of the flickr_dataset and CNNs for image category, elucidating their significance in our examination. Through this research endeavor, we aim to make a contribution to the development of image captioning structures with the aid of combining modern-day strategies from both laptop imaginative and prescient and herbal language processing domain names.

Keywords : CNN, LSTM, Image Captioning, Deep Learning.