Authors :
Abhijith Suresh; Aevin Tom Anish; Simon Alexander; Sreelakshmi S; Tinku Soman Jacob
Volume/Issue :
Volume 8 - 2023, Issue 5 - May
Google Scholar :
https://bit.ly/3TmGbDi
Scribd :
https://tinyurl.com/e9xvuk4a
DOI :
https://doi.org/10.5281/zenodo.8081517
Abstract :
The environment of video production and
consumption on social media platforms has undergone a
significant change as a result of the widespread usage of
the internet and reasonably priced video capture devices.
By producing a brief description of each video, video
summarizing helps viewers rapidly understand the
content of videos. However, standard paging techniques
can result in a severe burden on computer systems, while
artificial extraction might be time-consuming with a high
quantity of missing data. The multi-view description of
videos can also be found in the rich textual content that is
typically included with videos on social media platforms,
such as subtitles or bullet-screen comments. Here, a novel
framework for concurrently modelling visual and textual
information is proposed in order to achieve that goal.
Characters are found randomly using detection
techniques, identified by re-identification modules to
extract probable key-frames, and then the frames are
aggregated as a summary. The subtitles and bullet-screen
remarks are also used as multi-source textual information
to create a final text summary of the target character
from the input video.
Keywords :
Video, Summary, Subtitles, Bullet-Screen, Frames.
The environment of video production and
consumption on social media platforms has undergone a
significant change as a result of the widespread usage of
the internet and reasonably priced video capture devices.
By producing a brief description of each video, video
summarizing helps viewers rapidly understand the
content of videos. However, standard paging techniques
can result in a severe burden on computer systems, while
artificial extraction might be time-consuming with a high
quantity of missing data. The multi-view description of
videos can also be found in the rich textual content that is
typically included with videos on social media platforms,
such as subtitles or bullet-screen comments. Here, a novel
framework for concurrently modelling visual and textual
information is proposed in order to achieve that goal.
Characters are found randomly using detection
techniques, identified by re-identification modules to
extract probable key-frames, and then the frames are
aggregated as a summary. The subtitles and bullet-screen
remarks are also used as multi-source textual information
to create a final text summary of the target character
from the input video.
Keywords :
Video, Summary, Subtitles, Bullet-Screen, Frames.