BMTA：多元场景下的大面积破损图像修复

doi:10.3778/j.issn.1673-9418.2406095

摘要/Abstract

摘要： 针对图像修复过程中图像像素之间语义联系不连贯、大范围损坏图像的局部纹理细节修复效果不明显的问题，提出一种名为BMTA的单阶段图像修复网络模型，用于修复多场景下的大面积破损图像，使修复出的图像在人眼主观感受和客观评价指标上都有良好的表现。生成器模块通过在卷积层中穿插双重单向注意力模块来对输入图像进行特征压缩、重建和强化重要特征信息，将压缩的特征分通道进行局部特征提取和全局特征提取，利用分割条纹窗口建立全局信息联系，使用残差密集块对局部细节信息深度提取，并将所提取的特征进行融合。在解码器部分，为防止在解码过程中造成局部信息丢失和修复过程中对上下文信息理解的不准确，使用门控的线性自注意力模块来保证网络中信息的多层次保留，从而达到更接近原图的修复效果。使用鉴别器来评估修复结果，促使修复图像在结构和纹理上具有更好的表现性。在CelebA、StreetView以及Places2数据集上的表现均优于当前先进的图像修复算法。

关键词: 图像修复, 注意力机制, Transformer, 特征提取

Abstract: Aiming at the problems of incoherent semantic connection between image pixels and ineffective restoration effect of local texture details in large-scale damaged images, this paper proposes a single-stage image restoration network model named BMTA (block of multi-transformer attention). It can be used to repair a large area of damaged images in multiple scenes, so that the repaired images can have a good performance in the subjective perception of human eyes and objective evaluation indicators. The generator module performs feature compression, reconstruction and enhancement of important feature information of the input image by interspersing dual unidirectional attention modules in the convolution layer. The compressed feature information is divided into channels for local feature extraction and global feature extraction. The global information connection is established by using the segmented fringe window, and the local detail information is extracted with depth by using residual dense blocks. The extracted features are fused. In the decoder part, in order to prevent the local information loss caused by the decoding process and the inaccurate understanding of the context information during the restoration process, the gated linear self-attention module is used to ensure the multi-level retention of information in the network, so as to achieve the restoration effect closer to the original image. Finally, a discriminator is used to evaluate the repair results and promote better expressiveness of the repaired images in terms of structure and texture. The proposed method in this paper performs better than the current advanced image restoration algorithms on CelebA, StreetView, and Places2 datasets.

Key words: image inpainting, attention mechanism, Transformer, feature extraction

曹岩, 辛子昊, 邬开俊, 单宏全, 郭炳森. BMTA：多元场景下的大面积破损图像修复[J]. 计算机科学与探索, 2025, 19(6): 1553-1563.

CAO Yan, XIN Zihao, WU Kaijun, SHAN Hongquan, GUO Bingsen. BMTA: Inpainting of Large Area Damaged Images in Multiple Scenarios[J]. Journal of Frontiers of Computer Science and Technology, 2025, 19(6): 1553-1563.

参考文献

[1] 吕建峰, 邵立珍, 雷雪梅. 基于深度神经网络的图像修复算法综述[J]. 计算机工程与应用, 2023, 59(20): 1-12.
LYU J F, SHAO L Z, LEI X M. Image inpainting algorithm based on deep neural networks[J]. Computer Engineering and Applications, 2023, 59(20): 1-12.
[2] YU J H, LIN Z, YANG J M, et al. Generative image inpainting with contextual attention[C]//Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2018: 5505-5514.
[3] ROUT L, PARULEKAR A, CARAMANIS C, et al. A theoretical justification for image inpainting using denoising diffusion probabilistic models[EB/OL]. [2024-04-23]. https://arxiv.org/abs/2302.01217.
[4] XIE S A, ZHAO Y, XIAO Z S, et al. DreamInpainter: text-guided subject-driven image inpainting with diffusion models[EB/OL]. [2024-04-23]. https://arxiv.org/abs/2312.03771.
[5] CRIMINISI A, PéREZ P, TOYAMA K. Region filling and object removal by exemplar-based image inpainting[J]. IEEE Transactions on Image Processing, 2004, 13(9): 1200-1212.
[6] WANG W L, JIA Y J. Damaged region filling and evaluation by symmetrical exemplar-based image inpainting for Thangka[J]. EURASIP Journal on Image and Video Processing, 2017(1): 38.
[7] GUO X F, YANG H Y, HUANG D. Image inpainting via conditional texture and structure dual generation[C]//Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2021: 14114-14123.
[8] 罗海银, 郑钰辉. 图像修复方法研究综述[J]. 计算机科学与探索, 2022, 16(10): 2193-2218.
LUO H Y, ZHENG Y H. Survey of research on image inpainting methods[J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(10): 2193-2218.
[9] 柏劲咸, 樊瑶, 王帅帅, 等. 基于Transformer结构的图像修复算法研究综述[J]. 计算机仿真, 2024, 41(8): 161-169.
BAI J X, FAN Y, WANG S S, et al. A review of image restoration algorithms based on Transformer structure[J]. Computer Simulation, 2024, 41(8): 161-169.
[10] 徐志刚, 杨欣宇. 结合CSWin-Transformer和门卷积的壁画图像修复方法[J]. 计算机工程与应用, 2024, 60(21): 215-224.
XU Z G, YANG X Y. Mural image restoration method based on CSWin-Transformer and gate convolution[J]. Computer Engineering and Applications, 2024, 60(21): 215-224.
[11] YOU Y L, XU W, TANNENBAUM A, et al. Behavioral analysis of anisotropic diffusion in image processing[J]. IEEE Transactions on Image Processing, 1996, 5(11): 1539-1553.
[12] LIU G L, REDA F A, SHIH K J, et al. Image inpainting for irregular holes using partial convolutions[C]//Proceedings of the 15th European Conference on Computer Vision. Cham: Springer, 2018: 89-105.
[13] LI X G, GUO Q, LIN D, et al. MISF: multi-level interactive siamese filtering for high-fidelity image inpainting[C]// Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2022: 1859-1868.
[14] WAN Z Y, ZHANG J B, CHEN D D, et al. High-fidelity pluralistic image completion with transformers[C]//Proceedings of the 2021 IEEE/CVF International Conference on Com-puter Vision. Piscataway: IEEE, 2021: 4672-4681.
[15] LIU T J, CHEN B W, LIU K H. Lightweight image inpainting by stripe window transformer with joint attention to CNN[EB/OL]. [2024-04-23]. https://arxiv.org/abs/2301.00553.
[16] DENG Y, HUI S Q, ZHOU S P, et al. T-former: an efficient transformer for image inpainting[C]//Proceedings of the 30th ACM International Conference on Multimedia. New York: ACM, 2022: 6559-6568.
[17] WANG J K, CHEN S X, WU Z X, et al. FT-TDR: frequency-guided transformer and top-down refinement network for blind face inpainting[J]. IEEE Transactions on Multimedia, 2022, 25: 2382-2392.
[18] LI W B, LIN Z, ZHOU K, et al. MAT: mask-aware transformer for large hole image inpainting[C]//Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2022: 10748-10758.
[19] YU Y C, ZHAN F N, WU R L, et al. Diverse image inpainting with bidirectional and autoregressive transformers[C]//Proceedings of the 29th ACM International Conference on Multimedia. New York: ACM, 2021: 69-78.
[20] LI L X, ZOU Q, ZHANG F, et al. Line drawing guided progressive inpainting of mural damage[EB/OL]. [2024-05-10]. https://arxiv.org/abs/2211.06649.
[21] YU J H, LIN Z, YANG J M, et al. Free-form image inpainting with gated convolution[C]//Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2019: 4470-4479.
[22] ZHANG H R, HU Z Z, LUO C Z, et al. Semantic image inpainting with progressive generative networks[C]//Proceedings of the 26th ACM International Conference on Multimedia. New York: ACM, 2018: 1939-1947.
[23] MA Z C, WANG Y X, REZA TOHIDYPOUR H, et al. Enhancing image quality by reducing compression artifacts using dynamic window swin transformer[J]. IEEE Journal on Emerging and Selected Topics in Circuits and Systems, 2024, 14(2): 275-285.
[24] FAN Y, SHI Y N, ZHANG N J, et al. Image inpainting based on structural constraint and multi-scale feature fusion[J]. IEEE Access, 2023, 11: 16567-16587.
[25] WOO S, PARK J, LEE J Y, et al. CBAM: convolutional block attention module[C]//Proceedings of the 15th European Conference on Computer Vision. Cham: Springer, 2018: 3-19.
[26] HU J, SHEN L, SUN G. Squeeze-and-excitation networks[C]//Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2018: 7132-7141.
[27] 朱凯, 李理, 张彤, 等. 基于Transformer的多阶段运动模糊图像修复网络[J]. 计算机工程, 2024, 50(9): 276-285.
ZHU K, LI L, ZHANG T, et al. Multi-stage motion blur image inpainting network based on Transformer[J]. Computer Engineering, 2024, 50(9): 276-285.
[28] DONG Q L, CAO C J, FU Y W. Incremental transformer structure enhanced image inpainting with masking positional encoding[C]//Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2022: 11348-11358.
[29] 欧静, 文志诚, 邓文贵, 等. 利用边缘条件的多特征融合图像修复算法[J]. 计算机工程与应用, 2023, 59(23): 191-201.
OU J, WEN Z C, DENG W G, et al. Research on multi-feature fusion image restoration based on edge conditions[J]. Computer Engineering and Applications, 2023, 59(23): 191-201.
[30] RONG W B, LI Z J, ZHANG W, et al. An improved Canny edge detection algorithm[C]//Proceedings of the 2014 IEEE International Conference on Mechatronics and Automation. Piscataway: IEEE, 2014: 577-582.
[31] MATEEN M, WEN J H, NASRULLAH, et al. Fundus image classification using VGG-19 architecture with PCA and SVD[J]. Symmetry, 2019, 11(1): 1.
[32] ISOLA P, ZHU J Y, ZHOU T H, et al. Image-to-image translation with conditional adversarial networks[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2017: 5967-5976.
[33] LIU H Y, JIANG B, XIAO Y, et al. Coherent semantic attention for image inpainting[C]//Proceedings of the 2019 IEEE/ CVF International Conference on Computer Vision. Piscataway: IEEE, 2019: 4170-4179.
[34] LI J Y, WANG N, ZHANG L F, et al. Recurrent feature reasoning for image inpainting[C]//Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2020: 7757-7765.
[35] QUAN W Z, ZHANG R S, ZHANG Y, et al. Image inpainting with local and global refinement[J]. IEEE Transactions on Image Processing, 2022, 31: 2405-2420.
[36] 陈刚, 盛况, 杨振国, 等. 傅里叶变换下的粗细双路径图像修复算法[J]. 计算机工程与应用, 2024, 60(1): 217-226.
CHEN G, SHENG K, YANG Z G, et al. Coarse and fine dual path image inpainting algorithm based on Fourier transform[J]. Computer Engineering and Applications, 2024, 60(1): 217-226.