计算机科学与探索 ›› 2023, Vol. 17 ›› Issue (8): 1981-1994.DOI: 10.3778/j.issn.1673-9418.2203118

• 网络·安全 • 上一篇    

面向图像复制-粘贴溯源的级联双流注意力网络

吉彦卿,张玉金   

  1. 上海工程技术大学 电子电气工程学院,上海 201620
  • 出版日期:2023-08-01 发布日期:2023-08-01

Cascaded Two-Stream Attention Networks for Traceability Analysis of Copy-Move Images

JI Yanqing, ZHANG Yujin   

  1. School of Electronic and Electrical Engineering, Shanghai University of Engineering Science, Shanghai 201620, China
  • Online:2023-08-01 Published:2023-08-01

摘要: 复制-粘贴是一种常见的图像篡改方式。传统的图像复制-粘贴取证方法主要致力于伪造区域定位研究,而如何精细化区分复制的源和粘贴的目标区域是图像取证领域的难点问题。当前,能从原始伪造图像中定位篡改源/目标区域的算法普遍存在不足。在已有算法的基础上提出了一种级联双流注意力网络。该网络分为两个阶段:第一阶段由编码、特征分析和解码网络构成。在编码部分,采用轻量级网络MobileNetV2作为主干提取图像浅层和深层特征形成双流输出;在特征分析阶段,利用相似特征注意力机制和空洞空间卷积池化金字塔模块多尺度捕捉深层特征中的篡改区域,并利用浅层特征分支改善网络对篡改区域边缘细节的分割性能;在解码部分,对特征图逐像素做类别预测并上采样。网络的第二阶段对一阶段检测到的篡改区域进行源/目标区分。同样采用双流结构,双分支输入分别为包含源/目标区域的原始图像块和经过提取的噪声图。同时将提取到的块特征融合后预测类别,最终采用区域映射的方式实现像素级定位。实验结果表明,该网络不仅能有效地定位篡改区域,同时还能较好地区分复制-粘贴的源/目标。该网络的第一阶段在测试集和两个公共数据集上相较于同结构模型,性能分别上升9.4、2.6和2.5个百分点,而最终的端到端测试集检测性能提升12.03%;同时,其对常规的图像后处理具有更好的鲁棒性。

关键词: 图像篡改区域定位, 级联双流网络, 特征融合, 注意力机制, 图像噪声提取

Abstract: Copy-move is a common way of the image forgery. Traditional methods are committed to locating tam-pering regions of copy-move tampering images, but the accurate distinction between the source and target of the copy-move image has become a bottleneck in the field of image forensics. At present, algorithms which can locate the tampering source and target regions from the original copy-move forged images still have some disadvantages. Therefore, this paper proposes a cascaded two-stream attention network for traceability analysis of copy-move images. It is divided into two stages. The first stage of the network includes a coding network, a module to analyze features and a decoding network. In the coding part, lightweight MobileNetV2 is used as the backbone to extract low and deep features as the double outputs of the network. In the module of analyzing features, tampering regions in deep features are multi-dimensionally captured by the attention mechanism of similar features and atrous spatial pyramid pooling module. At the same time, the low feature is used to improve the model’s performance of segmenting edges and details of tampered regions. In the decoding part, the feature map is predicted pixel by pixel and sampled. In the second stage of the network, the tampering regions detected in the first stage network are distinguished between the source and target. It is also a two-stream network. The inputs of two-branch are the original image blocks including the source or target and image blocks after extracting noise. The multiscale features are used to predict category, and the final mask is output by the region mapping. Experimental results show that the proposed network can not only locate the tampering regions, but also distinguish the source and target. The performance compared with the latest algorithm of the first stage of the network in the test dataset and two public datasets is increased by 9.4, 2.6, and 2.5 percentage points respectively, and the end-to-end performance in the test dataset is improved by 12.03%. At the same time, it has better robustness to conventional image post-processing.

Key words: localization of tampering regions in images, cascaded two-stream networks, feature fusion, attention module, noise extraction of image