[1] HO J, JAIN A, ABBEEL P. Denoising diffusion probabilistic models[C]//Advances in Neural Information Processing Systems 33, 2020: 6840-6851.
[2] GOODFELLOW I, POUGET-ABADIE J, MIRZA M, et al. Generative adversarial nets[C]//Advances in Neural Information Processing Systems 27, 2014: 2672-2680.
[3] NICHOL A, DHARIWAL P, RAMESH A, et al. Glide: towards photorealistic image generatioadnd editing with text-guided diffusion models[EB/OL]. [2024-06-30]. https://arxiv.org/abs/2112.10741.
[4] ROMBACH R, BLATTMANN A, LORENZ D, et al. High-resolution image synthesis with latent diffusion models[C]//Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2022: 10674-10685.
[5] RAMESH A, DHARIWAL P, NICHOL A, et al. Hierarchical text-conditional image generation with clip latents[EB/OL]. [2024-06-30]. https://arxiv.org/abs/2204.06125.
[6] RunwayML. Stable diffusion v1.5: stable diffusion v1-5 model card[EB/OL]. (2023-08-24)[2024-06-30]. https://huggingface.co/runwayml/stable-diffusion-v1-5.
[7] 刘泽润, 尹宇飞, 薛文灏, 等. 基于扩散模型的条件引导图像生成综述[J]. 浙江大学学报(理学版), 2023, 50(6): 651-667.
LIU Z R, YIN Y F, XUE W H, et al. A review of conditional image generation based on diffusion models[J]. Journal of Zhejiang University (Science Edition), 2023, 50(6): 651-667.
[8] WANG Z X, ZHANG Z Y, ZHANG X Y, et al. DR2: diffusion-based robust degradation remover for blind face restoration[C]//Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2023: 1704-1713.
[9] KIM M, LIU F, JAIN A, et al. DCFace: synthetic face generation with dual condition diffusion model[C]//Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2023: 12715-12725.
[10] HUANG Z Q, CHAN K C K, JIANG Y M, et al. Collaborative diffusion for multi-modal face generation and editing[C]//Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2023: 6080-6090.
[11] Lambdalabs.sd-image-variations-diffusers: stable diffusion image variations model card[EB/OL]. (2023-02-08)[2024-06-30]. https://huggingface.co/lambdalabs/sd-image-variations-diffusers.
[12] RUIZ N, LI Y Z, JAMPANI V, et al. DreamBooth: fine tuning text-to-image diffusion models for subject-driven generation[C]//Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2023: 22500-22510.
[13] ZHANG L M, RAO A Y, AGRAWALA M. Adding conditional control to text-to-image diffusion models[C]//Proceedings of the 2023 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2023: 3813-3824.
[14] MOU C, WANG X T, XIE L B, et al. T2I-adapter: learning adapters to dig out more controllable ability for text-to-image diffusion models[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2024, 38(5): 4296-4304.
[15] HU Y, ZHANG J, LIU S, et al. IP-Adapter: text compatible image prompt adapter for text-to-image diffusion models[EB/OL]. [2024-06-30]. https://arxiv.org/abs/2308.06721.
[16] KARRAS T, AITTALA M, LAINE S, et al. Alias-free generative adversarial networks[C]//Advances in Neural Information Processing Systems 34, 2021: 852-863.
[17] PATASHNIK O, WU Z Z, SHECHTMAN E, et al. StyleCLIP: text-driven manipulation of StyleGAN imagery[C]//Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2021: 2065-2074.
[18] RADFORD A, KIM J W, HALLACY C, et al. Learning transferable visual models from natural language supervision[C]//Proceedings of the 38th International Conference on Machine Learning. New York: ACM, 2021: 8748-8763.
[19] VAN DEN OORD A, VINYALS O. Neural discrete representation learning[C]//Advances in Neural Information Processing Systems 30, 2017: 6309-6318.
[20] ESSER P, ROMBACH R, OMMER B. Taming transformers for high-resolution image synthesis[C]//Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2021: 12868-12878.
[21] KIM G, KWON T, YE J C. DiffusionCLIP: text-guided diffusion models for robust image manipulation[C]//Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2022: 2416-2425.
[22] SOHL-DICKSTEIN J, WEISS E, MAHESWARANATHAN N, et al. Deep unsupervised learning using nonequilibrium thermodynamics[C]//Proceedings of the 32nd International Conference on Machine Learning, 2015: 2256-2265.
[23] SONG J, MENG C, ERMON S. Denoising diffusion implicit models[EB/OL]. [2024-06-30]. https://arxiv.org/abs/2010.02502.
[24] DHARIWAL P, NICHOL A. Diffusion models beat GANs on image synthesis[C]//Advances in Neural Information Processing Systems 34, 2021: 8780-8794.
[25] BALAJI Y, NAH S, HUANG X, et al. eDiff-I: text-to-image diffusion models with an ensemble of expert denoisers[EB/OL]. [2024-06-30]. https://arxiv.org/abs/2211.01324.
[26] XU X Q, WANG Z Y, ZHANG E, et al. Versatile diffusion: text, images and variations all in one diffusion model[C]//Proceedings of the 2023 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2023: 7720-7731.
[27] HUANG, L, CHEN D, LIU Y, et al. Composer: creative and controllable image synthesis with composable conditions[EB/OL]. [2024-06-30]. https://arxiv.org/abs/2302.09778.
[28] HOULSBY N, GIURGIU A, JASTRZEBSKI S, et al. Parameter-efficient transfer learning for NLP[C]//Proceedings of the 36th International Conference on Machine Learning. New York: ACM, 2019: 2790-2799.
[29] ZHAO S, CHEN D, CHEN Y, et al. Uni-ControlNet: all-in-one control to text-to-image diffusion models[C]//Advances in Neural Information Processing Systems 36, 2024.
[30] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]//Advances in Neural Information Processing Systems 30, 2017.
[31] WANG Z H, LIU X H, LI H S, et al. CAMP: cross-modal adaptive message passing for text-image retrieval[C]//Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2019: 5763-5772.
[32] BORJI A. Generated faces in the wild: quantitative comparison of stable diffusion, midjourney and DALL-E 2[EB/OL]. [2024-06-30]. https://arxiv.org/abs/2210.00586.
[33] SCHUHMANN C, BEAUMOUNT R, VENCU R, et al. LAION-5B: an open large-scale dataset for training next generation image-text models[C]//Advances in Neural Information Processing Systems 35, 2022: 25278-25294.
[34] RONNEBERGER O, FISCHER P, BROX T. U-Net: convolutional networks for biomedical image segmentation[C]//Proceedings of the 2015 International Conference on Medical Image Computing and Computer-Assisted Intervention. Cham: Springer, 2015: 234-241.
[35] LOSHCHILOV I, HUTTER F. Decoupled weight decay regularization[EB/OL]. [2024-06-30]. https://arxiv.org/abs/1711.05101.
[36] LEE C H, LIU Z, WU L, et al. MaskGAN: towards diverse and interactive facial image manipulation[C]//Proceedings of the 33rd IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2020: 5549-5558.
[37] ZHANG R, ISOLA P, EFROS A A, et al. The unreasonable effectiveness of deep features as a perceptual metric[C]//Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2018: 586-595.
[38] HESSEL J, HOLTZMAN A, FORBES M, et al. CLIPScore: a reference-free evaluation metric for image captioning[EB/OL]. [2024-06-30]. https://arxiv.org/abs/2104.08718.
[39] KRIZHEVSKY A, SUTSKERVER I, HINTON G E. ImageNet classification with deep convolutional neural networks[C]//Advances in Neural Information Processing Systems 25, 2012.
[40] CompVis, stable-diffusion-v1-4: stable diffusion v1-4 model card[EB/OL]. (2023-08-24)[2024-06-30]. https://huggingface.co/CompVis/stable-diffusion-v1-4.
[41] Imagepipeline, Realistic-Vision-V6.0: Realistic-Vision-V6.0[EB/OL]. (2024-01-13)[2024-06-30]. https://huggingface.co/imagepipeline/Realistic-Vision-V6.0.
[42] Stablediffusionapi, anything-v5: anything V5 API inference [EB/OL]. (2023-07-05)[2024-06-30]. https://huggingface.co/stablediffusionapi/anything-v5. |