Advancements in diffusion models for high-resolution image and short form video generation
1 Kenan-Flagler Business School, University of North Carolina at Chapel Hill, North Carolina, USA.
2 Fuqua School of Business, Duke University, Durham, North Carolina, USA.
3 Ross School of Business, University of Michigan, USA.
4 School of Computing, Engineering and Built Environment, Glasgow Caledonian University, Glasgow, United Kingdom.
5 Community and Program Specialist, UHAI For Health Inc, Worcester, Massachusetts, USA.
GSC Advanced Research and Reviews, 2024, 21(02), 508–520.
Article DOI: 10.30574/gscarr.2024.21.2.0441
Publication history:
Received on 07 October 2024; revised on 17 November 2024; accepted on 19 November 2024
Abstract:
This paper offers an in-depth review of the most recent advancements in diffusion models, particularly highlighting their transformative role in high-resolution image generation and their emerging applications in short-form video generation. Diffusion models, a class of generative models rooted in probabilistic frameworks, have rapidly gained prominence due to their ability to produce photorealistic and detailed outputs by reversing a noise-infusion process. Their strength lies in their capacity to generate high-quality media that exceeds previous limitations of generative models like GANs, especially in terms of diversity and training stability. The study utilized five scientific databases and a systematic search strategy to identify research articles on PubMed, Google Scholar, Scopus, IEEE, and Science Direct relating to the topic. Furthermore, books, dissertations, master's theses, and conference proceedings were utilized in this study. This study encompassed all publications published until 2024. The review begins by delving into the fundamental principles underlying diffusion models, which revolve around the process of gradually adding and removing noise from an image or video over a series of time steps. This section emphasizes the mathematical foundation of diffusion processes, particularly the forward process of noise addition and the reverse process of denoising, which enables these models to generate media with fine detail. A significant portion of this review is dedicated to the impact of diffusion models on high-resolution image and short-form video generation as well as success metrics for evaluating short-form video generation, curation, and summarization, areas where they have been especially transformative. Conclusively, this paper provides a comprehensive exploration of how diffusion models have reshaped the landscape of media generation. From their foundational principles and technical evolution to their applications in high-resolution media and short-form video, the paper highlights both the profound potential of these models and the ongoing challenges that must be addressed for their responsible and scalable use.
Keywords:
Diffusion Models; Video Creation; High Resolution Image; Short-form Video
Full text article in PDF:
Copyright information:
Copyright © 2024 Author(s) retain the copyright of this article. This article is published under the terms of the Creative Commons Attribution Liscense 4.0