Authors :
Monali Kapuriya; Zemi Lakkad; Satwi Shah
Volume/Issue :
Volume 9 - 2024, Issue 8 - August
Google Scholar :
https://tinyurl.com/2r682n3n
Scribd :
https://tinyurl.com/msjaassb
DOI :
https://doi.org/10.38124/ijisrt/IJISRT24AUG851
Abstract :
In this have a look at, we discover the
integration of Convolutional Neural Networks (CNNs)
and Long Short-Term Memory (LSTM) networks for
the motive of image caption generation, a mission that
involves a fusion of herbal language processing and
computer imaginative and prescient techniques to
describe images in English. Delving into the realm of
photograph captioning, we meticulously investigate
several fundamental concepts and methodologies
associated with this area. Our technique includes
leveraging prominent equipment inclusive of the Keras
library, numpy, and Jupyter notebooks to facilitate the
development of our studies. Furthermore, we delve
into the utilization of the flickr_dataset and CNNs for
image category, elucidating their significance in our
examination. Through this research endeavor, we aim
to make a contribution to the development of image
captioning structures with the aid of combining
modern-day strategies from both laptop imaginative
and prescient and herbal language processing domain
names.
Keywords :
CNN, LSTM, Image Captioning, Deep Learning.
References :
- Abhaya Agarwal and Alon Lavie. 2008. Meteor, m-bleu and m-ter: Evaluation metrics for high-correlation with human rankings of machine translation output. In Proceedings of the ThirdWorkshop on Statistical Machine Translation. Association for Computational Linguistics, 115–118.
- Ahmet Aker and Robert Gaizauskas. 2010. Generating image descriptions using dependency relational patterns. In Proceedings of the 48th annual meeting of the association for computational linguistics. Association for Computational Linguistics, 1250–1258.
- Peter Anderson, Basura Fernando, Mark Johnson, and Stephen Gould. 2016. Spice: Semantic propositional image caption evaluation. In European Conference on Computer Vision. Springer, 382–398.
- Peter Anderson, Xiaodong He, Chris Buehler, Damien Teney, Mark Johnson, Stephen Gould, and Lei Zhang. 2017. Bottom-up and top-down attention for image captioning and vqa. arXiv preprint arXiv:1707.07998 (2017).
- Jyoti Aneja, Aditya Deshpande, and Alexander G Schwing. 2018. Convolutional image captioning.In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5561–5570.
- Lisa Anne Hendricks, Subhashini Venugopalan, Marcus Rohrbach, Raymond Mooney, Kate Saenko, Trevor Darrell, Junhua Mao, Jonathan Huang, Alexander Toshev, Oana Camburu, et al. 2016. Deep compositional captioning: Describing novel object categories without paired training data. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.
- Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2015. Neural machine translation by jointly learning to align and translate. In International Conference on Learning Representations (ICLR).
- Shuang Bai and Shan An. 2018. A Survey on Automatic Image Caption Generation. Neurocomputing. ACM Computing Surveys, Vol. 0, No. 0, Article 0. Acceptance Date: October 2018. 0:30 Hossain et al.
- Satanjeev Banerjee and Alon Lavie. 2005. METEOR: An automatic metric for MT evaluation with improved correlation with human judgments. In Proceedings of the acl workshop on intrinsic and extrinsic evaluation measures for machine translation and/or summarization, Vol. 29. 65–72.
In this have a look at, we discover the
integration of Convolutional Neural Networks (CNNs)
and Long Short-Term Memory (LSTM) networks for
the motive of image caption generation, a mission that
involves a fusion of herbal language processing and
computer imaginative and prescient techniques to
describe images in English. Delving into the realm of
photograph captioning, we meticulously investigate
several fundamental concepts and methodologies
associated with this area. Our technique includes
leveraging prominent equipment inclusive of the Keras
library, numpy, and Jupyter notebooks to facilitate the
development of our studies. Furthermore, we delve
into the utilization of the flickr_dataset and CNNs for
image category, elucidating their significance in our
examination. Through this research endeavor, we aim
to make a contribution to the development of image
captioning structures with the aid of combining
modern-day strategies from both laptop imaginative
and prescient and herbal language processing domain
names.
Keywords :
CNN, LSTM, Image Captioning, Deep Learning.