Journal of Frontiers of Computer Science and Technology

• Science Researches •     Next Articles

BMTA: Inpainting of Large Area Damaged Images in Multiple Scenarios

CAO Yan,  XIN Zihao,  WU Kaijun,  SHAN Hongquan,  GUO Bingsen   

  1. School of Electronic and Information Engineering,  Lanzhou Jiaotong University,  Lanzhou 730070,  China

BMTA: 多元场景下的大面积破损图像修复

曹岩, 辛子昊, 邬开俊, 单宏全, 郭炳森   

  1. 兰州交通大学 电子与信息工程学院,兰州 730070

Abstract: Aiming at the problems Of incoherent semantic connection between image pixels and ineffective restoration effect of local texture details in large-scale damaged images, this paper proposes a single-stage image restoration network model named BMTA(Block Of Multi Transformer Attention). It can be used to repair a large area of damaged images in multiple scenes, so that the repaired images can have a good performance in the subjective perception of human eyes and objective evaluation indicators. The generator module performs feature compression, reconstruction and enhancement of important feature information of the input image by interspersed with dual unidirectional attention modules in the convolution layer. The compressed feature information is divided into channels for local feature extraction and global feature extraction. The global information connection is established by using the segmented fringe window, and the depth of local detail information is extracted by using residual dense blocks. The extracted features are fused. In the decoder part, in order to prevent the local information loss caused by the decoding process and the inaccurate understanding of the context information during the restoration process, the gated linear self-attention module is used to ensure the multi-level retention of information in the network, so as to achieve the restoration effect closer to the original image. Finally, a discriminator is used to evaluate the repair results and promote better expressiveness of the repaired images in terms of structure and texture. The paper performs better on CelebA, StreetView, and Place2 datasets than the current advanced image restoration algorithms.

Key words: Image inpainting, Attention mechanism, Transformer, Feature extraction

摘要: 针对图像修复过程中图像像素之间语义联系不连贯、大范围损坏图像的局部纹理细节修复效果不明显的问题,文章提出一种名为BMTA(Block Of Multi Transformer Attention)的单阶段图像修复网络模型,用于修复多场景下的大面积破损图像,使修复出的图像在人眼主观感受和客观评价指标上都能有良好的表现。生成器模块通过在卷积层中穿插双重单向注意力模块来对输入图像进行特征压缩、重建和强化重要特征信息,将压缩的特征分通道进行局部特征提取和全局特征提取,利用分割条纹窗口建立全局信息联系,使用残差密集块对局部细节信息深度提取,并将所提取的特征进行融合。在解码器部分,为防止在解码过程中造成局部信息丢失和修复过程中对上下文信息理解的不准确,使用门控的线性自注意力模块来保证网络中信息的多层次保留,从而达到更接近原图的修复效果。最后使用鉴别器来评估修复结果,促使修复图像在结构和纹理上具有更好的表现性。文章在CelebA、StreetView、以及Place2数据集上的表现均优于当前先进的图像修复算法。

关键词: 图像修复, 注意力机制, Transformer, 特征提取