Journal of Frontiers of Computer Science and Technology ›› 2023, Vol. 17 ›› Issue (1): 140-153.DOI: 10.3778/j.issn.1673-9418.2105060

• Graphics·Image • Previous Articles     Next Articles

Cross-Modal Fusion of RGB-D Salient Detection for Advanced Semantic Repair Strategy

SHI Yucheng, WU Yun, LONG Huiyun   

  1. School of Computer Science and Technology, Guizhou University, Guiyang 550025, China
  • Online:2023-01-01 Published:2023-01-01

高级语义修复策略的跨模态融合RGB-D显著性检测

石玉诚,吴云,龙慧云   

  1. 贵州大学 计算机科学与技术学院,贵阳 550025

Abstract: Aiming at the problem of incomplete location and fuzzy edge of salient region, this paper proposes a method of RGB-D salient target detection. Firstly, a cross-modal feature fusion module is designed to integrate RGB and depth information layer by layer. Six modal fusion feature outputs are obtained. This module reduces the discrepancy between RGB and depth information, providing more common and complementary deep features for the subsequent advanced semantic repair. Multiple levels of information is obtained based on the above modules.  This paper uses the last three layer features to jointly extract richer high-level semantic information, and the initial salient map is obtained. After that, the network structure of U-Net is used to fuse from top to the bottom of the network. After upsampling, each layer is fused with the next layer in channel dimension. The first three layers of bottom features are guided by advanced semantic features before and after fusion, to complete the repair of the low-level features. Finally, the final salient map is obtained. The proposed cross-modal feature fusion module can adaptively fuse multi-modal features, highlight the commonness and complementarity of fusion features, and reduce the ambiguity of fusion. The proposed advanced semantic repair strategy is helpful to accurately detect the salient region and improve the edge clarity. Experimental results show that the proposed algorithm outperforms most excellent methods on five datasets, including NJU2K, NLPR, STERE, DES and SIP, which achieves relatively advanced performance.

Key words: RGB-D, saliency target detection, cross-modal fusion, advanced semantic repair

摘要: 针对显著区域定位不完整以及边缘模糊问题,提出一种RGB-D显著性目标检测方法。该方法首先设计了一个跨模态特征融合模块来逐层融合RGB和Depth信息,并得到六个模态融合特征输出。该模块降低了RGB和Depth信息之间存在的差异性,为后续的高级语义修复提供更具共性和互补性的深层特征;基于上述模块获得的多层次信息,利用后三层特征,联合提取更丰富的高级语义信息,并得到初始显著图。之后,采用U-Net的网络结构,从网络的顶层向下融合,每一层经过上采样之后与下一层进行通道维度上的融合,前三层底层特征在融合前后采用高级语义特征进行指导,以完成对底层特征的修复。最后,得到最终的显著图。提出的跨模态特征融合模块能够自适应地融合多模态特征,突出融合特征的共性和互补性,降低融合的模糊度。提出的高级语义修复策略有助于准确检测出显著区域并提高边缘清晰度。实验结果表明,该算法在NJU2K、NLPR、STERE、DES、SIP五个数据集上均超过大部分优秀的方法,达到了较为先进的性能。

关键词: RGB-D, 显著性目标检测, 跨模态融合, 高级语义修复