Journal of Frontiers of Computer Science and Technology ›› 2024, Vol. 18 ›› Issue (6): 1404-1420.DOI: 10.3778/j.issn.1673-9418.2401075
• Frontiers·Surveys • Previous Articles Next Articles
ZHANG Zeyu, WANG Tiejun, GUO Xiaoran, LONG Zhilei, XU Kui
Online:
2024-06-01
Published:
2024-05-31
张泽宇,王铁君,郭晓然,龙智磊,徐魁
ZHANG Zeyu, WANG Tiejun, GUO Xiaoran, LONG Zhilei, XU Kui. Survey of AI Painting[J]. Journal of Frontiers of Computer Science and Technology, 2024, 18(6): 1404-1420.
张泽宇, 王铁君, 郭晓然, 龙智磊, 徐魁. AI绘画研究综述[J]. 计算机科学与探索, 2024, 18(6): 1404-1420.
Add to citation manager EndNote|Ris|BibTeX
URL: http://fcst.ceaj.org/EN/10.3778/j.issn.1673-9418.2401075
[1] 冯强. 人工智能绘画的艺术价值及未来发展研究[D]. 沈阳: 鲁迅美术学院, 2021. FENG Q. Research on artistic value and future development of artificial intelligence painting[D]. Shenyang: Luxun Academy of Fine Arts, 2021. [2] 列夫·马诺维奇, 埃马努埃莱·阿列利, 陈卓轩. 列夫·马诺维奇: 人工智能(AI)艺术与美学[J]. 世界电影, 2023(3): 4-24. MANOVICH L, ARIELLI E, CHEN Z X. Lev Manovich: art and aesthetics of artificial intelligence (AI)[J]. World Cinema, 2023(3): 4-24. [3] 李白杨, 白云, 詹希旎, 等. 人工智能生成内容(AIGC)的技术特征与形态演进[J]. 图书情报知识, 2023, 40(1): 66-74. LI B Y, BAI Y, ZHAN X N, et al. The technical features and aromorphosis of artificial intelligence generated content (AIGC)[J]. Library Intelligence Knowledge, 2023, 40(1): 66-74. [4] GARCIA C. Harold Cohen and AARON—a 40-year collaboration[EB/OL]. (2016-08-23)[2023-03-18]. https://computerhistory.org/blog/harold-cohen-and-aaron-a-40-year-collaboration. [5] 周飞. 人工智能数字绘画的艺术性思辨[J]. 湖北经济学院学报(人文社会科学版), 2017, 14(7): 14-15. ZHOU F. Thoughts on artistry of artificial intelligence digital painting[J]. Journal of Hubei University of Economics (Humanities and Social Sciences), 2017, 14(7): 14-15. [6] COLTON S. The painting fool: stories from building an auto-mated painter[M]//Computers and Creativity. Berlin, Heidelberg: Springer, 2012: 3-38. [7] LECUN Y, BENGIO Y, HINTON G. Deep learning[J]. Nature, 2015, 521(7553): 436-444. [8] GOODFELLOW I, POUGET-ABADIE J, MIRZA M, et al. Generative adversarial nets[C]//Advances in Neural Information Processing Systems 27, Montreal, Dec 8-13, 2014: 2672-2680. [9] MARKOV A A. Extension of the limit theorems of probability theory to a sum of variables connected in a chain[J]. Dynamic Probabilistic Systems, 1971, 1: 552-579. [10] HO J, JAIN A, ABBEEL P. Denoising diffusion probabilistic models[C]//Advances in Neural Information Processing Systems 33, Dec 6-12, 2020. Red Hook: Curran Associates, 2020: 6840-6851. [11] RUMELHART D E, HINTON G E, WILLIAMS R J. Learning representations by back-propagating errors[J]. Nature, 1986, 323(6088): 533-536. [12] BOURLARD H, KAMP Y. Auto-association by multilayer perceptrons and singular value decomposition[J]. Biological Cybernetics, 1988, 59(4/5): 291-294. [13] KINGMA D P, WELLING M. Auto-encoding variational Bayes[J]. Machine Learning, 2013, 106(9/10): 2979-3024. [14] LECUN Y, BOSER B, DENKER J S, et al. Backpropagation applied to handwritten zip code recognition[J]. Neural Computation, 1989, 1(4): 541-551. [15] VINCENT P, LAROCHELLE H, LAJOIE I, et al. Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion[J]. Journal of Machine Learning Research, 2010, 11(12): 3371-3408. [16] VAN DEN OORD A, VINYALS KAVUKCUOGLU K. Neural discrete representation learning[C]//Advances in Neural Information Processing Systems 30, Long Beach, Dec 4-9, 2017.Red Hook: Curran Associates, 2017: 6309-6318. [17] ALTMAN N S. An introduction to kernel and nearest-neighbor nonparametric regression[J]. The American Statistician, 1992, 46(3): 175-185. [18] VAN DEN OORD A, KALCHBRENNER N, ESPEHOLT L, et al. Pixel recurrent neural networks[C]//Proceedings of the 33rd International Conference on Machine Learning, New York, Jun 19-24, 2016: 1747-1756. [19] 陈淑環, 韦玉科, 徐乐, 等. 基于深度学习的图像风格迁移研究综述[J]. 计算机应用研究, 2019, 36(8): 2250-2255. CHEN S H, WEI Y K, XU L, et al. Survey of image style transfer based on deep learning[J]. Application Research of Computers, 2019, 36(8): 2250-2255. [20] 陈淮源, 张广驰, 陈高, 等. 基于深度学习的图像风格迁移研究进展[J]. 计算机工程与应用, 2021, 57(11): 37-45. CHEN H Y, ZHANG G C, CHEN G, et al. Research progress of image style transfer based on deep learning[J]. Computer Engineering and Applications, 2021, 57(11): 37-45. [21] GATYS L A, ECKER A S, BETHGE M. A neural algorithm of artistic style[J]. Computer Vision and Pattern Recognition, 2015, 29(2): 241-250. [22] SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition[EB/OL]. (2015-04-10)[2023-03-21]. https://arxiv.org/abs/1409.1556. [23] LI C, WAND M. Combining Markov random fields and convolutional neural networks for image synthesis[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Washington: IEEE Computer Society, 2016: 2479-2486. [24] LIAO J, YAO Y, YUAN L, et al. Visual attribute transfer through deep image analogy[J]. ACM Transactions on Graphics, 2017, 36(4): 120. [25] JOHNSON J, ALAHI A, LI F F. Perceptual losses for real-time style transfer and super-resolution[C]//Proceedings of the 14th European Conference on Computer Vision. Cham: Springer, 2016: 694-711. [26] ZHU J Y, PARK T, ISOLA P, et al. Unpaired image-to-image translation using cycle-consistent adversarial networks[C]//Proceedings of the 2017 IEEE International Conference on Computer Vision. Washington: IEEE Computer Society, 2017: 2223-2232. [27] LI Y, FANG C, YANG J, et al. Universal style transfer via feature transforms[C]//Advances in Neural Information Processing Systems 30, Long Beach, Dec 4-9, 2017. Red Hook: Curran Associates, 2017: 385-395. [28] ISOLA P, ZHU J Y, ZHOU T, et al. Image-to-image translation with conditional adversarial networks[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Washington: IEEE Computer Society, 2017: 5967-5976. [29] MIRZA M, OSINDERO S. Conditional generative adversarial nets[EB/OL]. [2023-03-21]. https://arxiv.org/abs/1411.1784. [30] RONNEBERGER O, FISCHER P, BROX T. U-Net: convolutional networks for biomedical image segmentation[C]//Proceedings of the 2015 International Conference on Medical Image Computing and Computer Assisted Intervention. Cham: Springer, 2015: 234-241. [31] SANAKOYEU A, KOTOVENKO D, LANG S, et al. A style-aware content loss for real-time HD style transfer[C]//Proceedings of the 15th European Conference on Computer Vision.Cham: Springer, 2018: 715-731. [32] ZHU J Y, ZHANG R, PATHAK D, et al. Toward multimodal image-to-image translation[C]//Advances in Neural Information Processing Systems 30, Long Beach, Dec 4-9, 2017.Red Hook: Curran Associates, 2017: 465-476. [33] BROCK A, DONAHUE J, SIMONYAN K. Large scale GAN training for high fidelity natural image synthesis[J]. Nature Reviews Physics, 2021, 3(6): 422-440. [34] KARRAS T, LAINE S, AILA T. A style-based generator architecture for generative adversarial networks[C]//Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2019: 4401-4410. [35] KARRAS T, LAINE S, AITTALA M, et al. Alias-free generative adversarial networks[C]//Advances in Neural Information Processing Systems 34, Dec 6-14, 2021: 852-863. [36] SAUER A, SCHWARZ K, GEIGER A. StyleGAN-XL: scaling StyleGAN to large diverse datasets[EB/OL]. (2022-05-05)[2023-07-11]. https://arxiv.org/abs/2202.00273. [37] JIANG Y, CHANG S, WANG Z. TransGAN: two pure transformers can make one strong GAN, and that can scale up[C]//Advances in Neural Information Processing Systems 34, Dec 6-14, 2021: 14745-14758. [38] TANG S. Lessons learned from the training of GANs on artificial datasets[EB/OL]. (2020-07-14)[2023-07-13]. https://arxiv.org/abs/2007.06418. [39] ZHANG Y, ZHOU P, HUANG Z, et al. Off-policy reinforce-ment learning for efficient and effective GAN architecture search[C]//Proceedings of the 16th European Conference on Computer Vision. Cham: Springer, 2020: 175-192. [40] WU Y, ZHOU P, WILSON A G, et al. Improving GAN training with probability ratio clipping and sample reweighting[C]//Advances in Neural Information Processing Systems 33, Dec 6-12, 2020. Red Hook: Curran Associates, 2020: 5729-5740. [41] TRAN N T, TRAN V H, NGUYEN N B, et al. Self-supervised GAN: analysis and improvement with multi-class minimax game[C]//Advances in Neural Information Processing Systems 32, Vancouver, Dec 8-14, 2019: 14761-14772. [42] GONG X, CHANG S, JIANG Y, et al. AutoGAN: neural architecture search for generative adversarial networks[C]//Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2019: 1-13. [43] TERJéK D. Adversarial lipschitz regularization[C]//Advances in Neural Information Processing Systems 32, Vancouver,Dec 8-14, 2019. Red Hook: Curran Associates, 2019: 1-17. [44] ZHANG D, KHOREVA A. Progressive augmentation of GANs[C]//Advances in Neural Information Processing Systems 32, Vancouver, Dec 8-14, 2019. Red Hook: Curran Associates, 2019: 6249-6259. [45] JABRI A, FLEET D J, CHEN T. Scalable adaptive computation for iterative Generation[EB/OL]. (2023-06-14)[2023-07-15]. https://arxiv.org/abs/2212.11972. [46] HO J, SAHARIA C, CHAN W, et al. Cascaded diffusion models for high fidelity image generation[EB/OL]. (2021-12-17)[2023-07-15]. https://arxiv.org/abs/2106.15282. [47] KIM D, LAI C H, LIAO W H, et al. Consistency trajectory models: learning probability flow ODE trajectory of diffusion[EB/OL]. (2023-10-01)[2023-11-07]. https://arxiv.org/abs/2310.02279. [48] NICHOL A, DHARIWAL P. Improved denoising diffusion probabilistic models[C]//Proceedings of the 38th International Conference on Machine Learning, Jul 18-24, 2021:8162-8171. [49] WANG Z, ZHOU P, HUANG Z, et al. Diffusion-GAN: training GANs with diffusion[EB/OL]. (2022-06-05)[2023-06-23]. https://arxiv.org/abs/2206.02262. [50] WANG Z, JIANG Y, ZHENG H, et al. Patch diffusion: faster and more data-efficient training of diffusion models[EB/OL].(2023-10-18)[2023-11-02]. https://arxiv.org/abs/2304.12526. [51] DARAS G, DELBRACIO M, TALEBI H, et al. Soft diffusion: score matching for general corruptions[EB/OL]. (2022-10-05)[2023-06-20]. https://arxiv.org/abs/2209.05442. [52] 赖丽娜, 米瑜, 周龙龙, 等. 生成对抗网络与文本图像生成方法综述[J]. 计算机工程与应用, 2023, 59(19): 21-39. LAI L N, MI Y, ZHOU L L, et al. Survey about generative adversarial network and text-to-image synthesis[J]. Computer Engineering and Applications, 2023, 59(19): 21-39. [53] REED S, AKATA Z, YAN X, et al. Generative adversarial text to image synthesis[C]//Proceedings of the 33rd International Conference on Machine Learning, New York, Jun 19-24, 2016: 1060-1069. [54] RADFORD A, METZ L, CHINTALA S, et al. Unsupervised representation learning with deep convolutional generative adversarial networks[EB/OL]. (2016-01-07)[2023-03-16]. https://arxiv.org/abs/1511.06434. [55] REED S, AKATA Z, MOHAN S, et al. Learning what and where to draw[C]//Advances in Neural Information Processing Systems 29, Barcelona, Dec 5-10, 2016: 241-250. [56] KARRAS T, AILA T, LAINE S, et al. Progressive growing of GANs for improved quality, stability, and variation[C]//Proceedings of the 2018 International Conference on Learning Representations. Red Hook: Curran Associates, 2018: 1-26. [57] ZHANG H, XU T, LI H, et al. StackGAN: text to photorealistic image synthesis with stacked generative adversarial networks[C]//Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway: IEEE, 2017: 5907-5915. [58] ZHANG H, XU T, LI H, et al. StackGAN++: realistic image synthesis with stacked generative adversarial networks[C]//Advances in Neural Information Processing Systems 30, Long Beach, Dec 4-9, 2017. Red Hook: Curran Associates,2017: 694-711. [59] YIN G, LIU B, SHENG L, et al. Semantics disentangling for text-to-image generation[C]//Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2019: 2327-2336. [60] XU T, ZHANG P, HUANG Q, et al. AttnGAN: finegrained text to image generation with attentional generative adversarial networks[C]//Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition. Washington: IEEE Computer Society, 2018: 1316-1324. [61] QIAO T, ZHANG J, XU D, et al. MirrorGAN: learning text-to-image gneration by redescription[C]//Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2019: 1505-1514. [62] RAMESH A, PAVLOV M, GOH G, et al. Zero-shot text-to-image generation[C]//Proceedings of the 38th International Conference on Machine Learning, Jul 18-24, 2021: 8821-8831. [63] RAZAVI A, VAN DEN OORD A, VINYALS O. Generating diverse high-fidelity images with VQ-VAE-2[C]//Advances in Neural Information Processing Systems 32, Vancouver,Dec 8-14, 2019: 14761-14772. [64] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]//Advances in Neural Information Processing Systems 30, Long Beach, Dec 4-9, 2017: 5998-6008. [65] RADFORD A, KIM J W, HALLACY C, et al. Learning transferable visual models from natural language supervision[C]//Proceedings of the 38th International Conference on Machine Learning, Jul 18-24, 2021: 8748-8763. [66] PATASHNIK O, WU Z, SHECHTMAN E, et al. StyleCLIP: text-driven manipulation of StyleGAN imagery[C]//Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2021: 2085-2094. [67] CROWSON K, BIDEMAN S, KORNIS D, et al. VQGAN-CLIP: open domain image generaton and editing with natural language guidance[C]//Proceedings of the 17th European Conference on Computer Vision. Cham: Springer, 2022: 88-105. [68] RAMESH A, DHARIWAL P, NICHOL A, et al. Hierarchical text-conditional image generation with CLIP latents[EB/OL]. (2022-04-13)[2023-06-20]. https://arxiv.org/abs/2204.06125. [69] YANG L, ZHANG Z, SONG Y, et al. Diffusion models: a comprehensive survey of methods and applications[EB/OL].(2023-10-11)[2023-11-02]. https://arxiv.org/abs/2209.00796. [70] SONG J, MENG C, ERMON S. Denoising diffusion implicit models[EB/OL]. (2022-10-05)[2023-11-02]. https://arxiv.org/abs/2010.02502. [71] LU C, ZHOU Y, BAO F, et al. DPM-Solver: a fast ODE solver for diffusion probabilistic model sampling in around 10 steps[EB/OL]. (2022-10-13)[2023-04-02]. https://arxiv.org/abs/2206.00927. [72] LU C, ZHOU Y, BAO F, et al. DPM-Solver++: fast solver for guided sampling of diffusion probabilistic models[EB/OL]. (2023-05-06)[2023-09-22]. https://arxiv.org/abs/2211.01095. [73] NICHOL A, DHARIWAL P, RAMESH A, et al. GLIDE: towards photorealistic image generation and editing with text-guided diffusion models[EB/OL]. (2022-03-08)[2023-09-28]. https://arxiv.org/abs/2112.10741. [74] SAHARIA C, CHAN W, SAXENA S, et al. Photorealistic text-to-image diffusion models with deep language understanding[EB/OL]. (2022-05-23)[2023-08-18]. https://arxiv.org/abs/2205.11487. [75] ROMBACH R, BLATTMANN A, LORENZ D, et al. High-resolution image synthesis with latent diffusion models[EB/OL]. (2022-04-13)[2023-08-17]. https://arxiv.org/abs/2112.10752. [76] FENG Z, ZHANG Z, YU X, et al. ERNIE-ViLG 2.0: improving text-to-image diffusion model with knowledge-enhanced mixture-of-denoising-experts[EB/OL]. (2023-03-28)[2023-05-18]. https://arxiv.org/abs/2210.15257. [77] CHEN W, HU H, SAHARIA C, et al. Re-Imagen: retrieval-augmented text-to-image generator[EB/OL]. (2022-11-22)[2023-08-09]. https://arxiv.org/abs/2209.14491. [78] ZHENG H, HE P, CHEN W, et al. Truncated diffusion probabilistic models and diffusion-based adversarial auto-encoders[C]//Proceedings of the 11th International Conference on Learning Representations, Kigali, May 1-5, 2023: 1-28. [79] LI R, LI W, YANG Y, et al. Swinv2-Imagen: hierarchical vision transformer diffusion models for text-to-image generation[EB/OL]. (2022-10-18)[2023-08-14]. https://arxiv.org/abs/2210.09549. [80] XIA W, YANG Y, XUE J H, et al. Towards open-world text-guided face image generation and manipulation[EB/OL]. (2021-04-18)[2023-08-20]. https://arxiv.org/abs/2104.08910. [81] XIA W, YANG Y, XUE J H, et al. TediGAN: text-guided diverse face image generation and manipulation[C]//Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2021: 1-13. [82] BROWN T, MANN B, RYDER N, et al. Language models are few-shot learners[C]//Advances in Neural Information Processing Systems 33, Dec 6-12, 2020: 1877-1901. [83] OPENAI. GPT-4 technical report[EB/OL]. (2023-12-19)[2023-12-23]. https://arxiv.org/abs/2303.08774. |
[1] | MA Li, ZOU Yali. Aesthetic Feature Image Generation Method Embedded with Self-Attention Mechanism [J]. Journal of Frontiers of Computer Science and Technology, 2021, 15(9): 1728-1739. |
[2] | MA Yongjie, XU Xiaodong, ZHANG Ru, XIE Yirong, CHEN Hong. Generative Adversarial Network and Its Research Progress in Image Generation [J]. Journal of Frontiers of Computer Science and Technology, 2021, 15(10): 1795-1811. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||
/D:/magtech/JO/Jwk3_kxyts/WEB-INF/classes/