Journal of Frontiers of Computer Science and Technology ›› 2024, Vol. 18 ›› Issue (6): 1404-1420.DOI: 10.3778/j.issn.1673-9418.2401075

• Frontiers·Surveys • Previous Articles     Next Articles

Survey of AI Painting

ZHANG Zeyu, WANG Tiejun, GUO Xiaoran, LONG Zhilei, XU Kui   

  1. 1. Key Laboratory of China??s Ethnic Languages and Information Technology of Ministry of Education, Northwest Minzu University, Lanzhou 730030, China
    2. School of Mathematics and Computer Science, Northwest Minzu University, Lanzhou 730030, China
  • Online:2024-06-01 Published:2024-05-31

AI绘画研究综述

张泽宇,王铁君,郭晓然,龙智磊,徐魁   

  1. 1. 西北民族大学 中国民族语言文字信息技术教育部重点实验室,兰州 730030
    2. 西北民族大学 数学与计算机科学学院,兰州 730030

Abstract: AI painting, as a popular research direction in the field of computer vision, is expanding its application boundaries in the fields of art creation, film and media, industrial design, and art education through natural language processing, graphic pre-training models, and diffusion models. Two types of AI painting, namely, image-to-image and text-to-image, are taken as the main lines, and the representative models and their key technologies and methods are analyzed in depth. For the image-to-image, the development lineage, generation principle, and advantages and disadvantages of each model are explored from two types of models based on AE and GAN, and their effects on the public dataset are summarized. For the text-to-image, the structural differences of the three types of models based on diffusion model and other models, as well as the generation effects of various types of models on three datasets are summarized. It is pointed out that the text-to-image utilizing the diffusion model has become a hot topic nowadays, which predicts the diversified development of image generation in the future. And the current mainstream AI painting platforms are compared and summarized from the perspectives of usage and generation speed. Finally, on the basis of summarizing the problems and controversies faced by AI painting at the technical and social levels, future trends such as the complementary development of AI painting and human artists, the increased interactivity of the painting process, and the emergence of new professions and industries are envisioned.

Key words: AI painting, image-to-image, text-to-image, image generation, artificial intelligence generated content (AIGC)

摘要: AI绘画,作为计算机视觉领域的热门研究方向,正通过自然语言处理技术、图文预训练大模型,以及新兴的扩散模型,不断拓展其在艺术创作、影视媒体、工业设计、艺术教育等领域的应用边界。将以图生图和以文生图两类AI绘画任务作为主线,深入分析了代表性模型及其关键技术和方法。对于以图生图方式,从基于自编码器和基于生成式对抗网络两类模型分别探讨了各自的发展脉络、生成原理以及优缺点,并总结了它们在公共数据集上的效果;对于以文生图方式,归纳了基于扩散模型等三类模型的结构区别,以及在三个数据集上各类模型的生成效果,同时指出利用扩散模型的以文生图方式已成为当下的热点,并预示着未来图像生成方式的多样化发展。对目前主流的AI绘画平台从使用方式、生成速度等角度进行了对比总结。最后在总结AI绘画在技术层面和社会层面所面临的问题与争议的基础上,展望了AI绘画与人类艺术家的互补发展、绘画过程互动性增强以及新职业和产业的出现等未来趋势。

关键词: AI绘画, 以图生图, 以文生图, 图像生成, 人工智能生成内容(AIGC)