Automatic video generator| International Journal of Innovative Science and Research Technology

Automatic Video Generator

Authors : K Tresha; Kavya; Medhaa PB; Pragathi T

Volume/Issue : Volume 9 - 2024, Issue 12 - December

Google Scholar : https://tinyurl.com/4fsceapa

Scribd : https://tinyurl.com/25cuscwv

DOI : https://doi.org/10.5281/zenodo.14470731

Abstract : Text-to-video (T2V) generation is an emerging field in artificial intelligence, gaining traction with advances in deep learning models like generative adversarial networks (GANs), diffusion models, and hybrid architectures. This paper provides a comprehensive survey of recent T2V methodologies, exploring models such as GAN-based frameworks, VEGAN-CLIP, IRC-GAN, Sora OpenAI, and CogVideoX, which aim to transform textual descriptions into coherent video content. These models face challenges in maintaining semantic coherence, temporal consistency, and realistic motion across generated frames. We examine the architectural designs, methodologies, and applications of key models, highlighting the advantages and limitations in their approaches to video synthesis. Additionally, we discuss benchmark advancements, such as T2VBench, which plays a crucial role in evaluating temporal consistency and content alignment. This review sheds light on the strengths and limitations of existing approaches and outlines ethical considerations and future directions for T2V generation in the realm of generative AI.

Keywords : Text-to-Video (T2V) Generation, Deep Learning, Generative Adversarial Networks (GANs), Diffusion Models, Hybrid Architectures, VQGAN- CLIP,IRC-GAN, Sora Open AI, Cog Video X, Semantic Coherence, Temporal Consistency, Realistic Motion, Video Synthesis, Benchmark Advancements, T2VBench,Content Alignment, Ethical Considerations, Generative AI.

References :

TiVGAN: Text to Image to Video Generation With Step-by-Step Evolutionary Generator DOYEON KIM(Member, IEEE), DONGGYU JOO AND JUNMO KIM , (Member, IEEE)School of Electrical Engineering, Korea Advanced Institute of Science and Technology, Daejeon 34141, South Korea.
Generate Impressive Videos with Text Instructions: A Review of OpenAI Sora, Stable Diffusion, Lumiere and Comparable Models by Enis Karaarslan1 and ¨Omer Aydın1.
Conditional GAN with Discriminative Filter Generation for Text-to-Video.Synthesis by Yogesh Balaji ,Martin Renqiang Min , Bing Bai , Rama Chellappa1 and Hans Peter Graf2.University of Maryland, College Park, NEC Labs America – Princeton
Transforming Text into Video: A Proposed Methodology for Video Production Using the VQGAN-CLIP Image Generative AI Model by SukChang Lee Prof., Dept. of Digital Contents, Konyang Univ., Korea
To Create What You Tell: Generating Videos from Captions by Yingwei Pan, Zhaofan Qiu, Ting Yao, Houqiang Li and Tao Mei.University of Science and Technology of China, Hefei, China.Microsoft Research, Beijing, China
Yitong Li, Martin Renqiang Min,Dinghan Shen, David Carlson,Lawrence Carin,Duke University, Durham, NC, United States, 27708 NEC Laboratories America, Princeton, NJ, United States, 08540 {yitong.li, dinghan.shen, david.carlson, lcarin}@duke.edu, [email protected]
AUTOLV: AUTOMATIC LECTURE VIDEO GENERATOR Wenbin Wang Yang Song Sanjay Jha ,School of Computer Science and Engineering, University of New South Wales, Australia
Sounding Video Generator: A Unified Framework for Text-guided Sounding Video Generation.Jiawei Liu, Weining Wang, Sihan Chen, Xinxin Zhu, Jing Liu
IRC-GAN: Introspective Recurrent Convolutional GAN for Text-to-video Generation,Kangle Deng , Tianyi Fei, Xin Huang and Yuxin Pengy.Institute of Computer Science and Technology, Peking,University, Beijing, [email protected]
Sora OpenAI’s Prelude: Social Media Perspectives on Sora OpenAI and the Future of AI Video Generation:REZA HADI MOGAVI, DERRICK WANG, JOSEPH TU, HILDA HADAN, and SABRINA A.
SGANDURRA,Stratford School of Interaction Design and Business, University of Waterloo, Canada,PAN HUI, Hong Kong University of Science and Technology (Guangzhou), Hong Kong SAR and Guangzhou, China,LENNART E. NACKE, Stratford School of Interaction Design and Business, University of Waterloo, Canada
CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer:Zhuoyi Yang Jiayan Teng Wendi Zheng Ming Ding Shiyu Huang,Jiazheng Xu Yuanming Yang Wenyi Hong Xiaohan Zhang Guanyu Feng,Da Yin Xiaotao Gu Yuxuan Zhang Weihan Wang Yean Cheng,Ting Liu Bin Xu Yuxiao Dong Jie Tang
StreamingT2V: Consistent, Dynamic, and Extendable.Long Video Generation from Text:Roberto Henschel, Levon Khachatryan, Daniil Hayrapetyan, Hayk Poghosyan, Vahram Tadevosyan,Zhangyang Wang1,2, Shant Navasardyan1, Humphrey Shi1,3,1Picsart AI Research (PAIR) 2UT Austin 3SHI Labs @ Georgia Tech, Oregon & UIUC
TAVGBench: Benchmarking Text to Audible-Video Generation:Yuxin Mao1, Xuyang Shen2, Jing Zhang3, Zhen Qin4, Jinxing Zhou5, Mochu Xiang1, Yiran Zhong2, Yuchao Dai1.Northwestern Polytechnical University
,OpenNLPLab, Shanghai AI Lab ,Australian National University,TapTap 5Hefei University of Technology ART•V: Auto-Regressive Text-to-Video Generation with Diffusion Models:Wenming Weng, Ruoyu Feng, Yanhui Wang, Qi Dai, Chunyu Wang, Dacheng Yin,Zhiyuan Zhao, Kai Qiu, Jianmin Bao, Yuhui Yuan, Chong Luo, Yueyi Zhang, Zhiwei Xiong.University of Science and Technology of China ,Microsoft Research Asia
Rescribe: Authoring and Automatically ,Editing Audio Descriptions:Amy Pavel ,Gabriel Reyes ,Jeffrey P. Bigham T2VBench: Benchmarking Temporal Dynamics for Text-to-Video Generation by Pengliang Ji, Chuyang Xiao, Huilin Tai, Mingxiao Huo.Carnegie Mellon University,ShanghaiTech University,McGill University
Tune-A-Video: One-Shot Tuning of Image Diffusion Models for Text-to-Video Generation by Jay Zhangjie Wu Yixiao Ge Xintao Wang Stan Weixian Lei Yuchao Gu Yufei Shi Wynne Hsu Ying Shan Xiaohu Qie Mike Zheng Shou.Show Lab, National University of Singapore ARC Lab, Tencent PCG
LAVIE: HIGH-QUALITY VIDEO GENERATION WITH CASCADED LATENT DIFFUSION MODELSYaohuiWang, Xinyuan Chen, Xin Ma, Shangchen Zhou, Ziqi Huang,Yi Wang, Ceyuan Yang, Yinan He, Jiashuo Yu, Peiqing Yang,Yuwei Guo, TianxingWu, Chenyang Si, Yuming Jiang, Cunjian Chen,Chen Change Loy, Bo Dai, Dahua Lin, Yu Qiao, Ziwei Liu
CogVideo: Large-scale Pre Training for Text-to-Video,Generation via Transformers by Wenyi Hong,Ming Ding,Wendi Zheng, Xinghan Liu, Jie Tang,Tsinghua University zBAAI {hongwy18@mails, dm18@mails, jietang@mail}.tsinghua.edu.cn
To Create What You Tell: Generating Videos from Captions by Yingwei Pan, Zhaofan Qiu, Ting Yao, Houqiang Li and Tao Mei.University of Science and Technology of China, Hefei, China.Microsoft Research, Beijing, China

Text-to-video (T2V) generation is an emerging field in artificial intelligence, gaining traction with advances in deep learning models like generative adversarial networks (GANs), diffusion models, and hybrid architectures. This paper provides a comprehensive survey of recent T2V methodologies, exploring models such as GAN-based frameworks, VEGAN-CLIP, IRC-GAN, Sora OpenAI, and CogVideoX, which aim to transform textual descriptions into coherent video content. These models face challenges in maintaining semantic coherence, temporal consistency, and realistic motion across generated frames. We examine the architectural designs, methodologies, and applications of key models, highlighting the advantages and limitations in their approaches to video synthesis. Additionally, we discuss benchmark advancements, such as T2VBench, which plays a crucial role in evaluating temporal consistency and content alignment. This review sheds light on the strengths and limitations of existing approaches and outlines ethical considerations and future directions for T2V generation in the realm of generative AI.

CALL FOR PAPERS

Paper Submission Last Date
31 - July - 2025

Video Explanation for Published paper

CALL FOR PAPERS

Never miss an update from Papermashup

Get notified about the latest tutorials and downloads.

Subscribe by Email

Get alerts directly into your inbox after each post and stay updated.

Subscribe by RSS

Add our RSS to your feedreader to get regular updates from us.