Cross-Modal Fusion of RGB-D Salient Detection for Advanced Semantic Repair Strategy

doi:10.3778/j.issn.1673-9418.2105060

Abstract

Abstract: Aiming at the problem of incomplete location and fuzzy edge of salient region, this paper proposes a method of RGB-D salient target detection. Firstly, a cross-modal feature fusion module is designed to integrate RGB and depth information layer by layer. Six modal fusion feature outputs are obtained. This module reduces the discrepancy between RGB and depth information, providing more common and complementary deep features for the subsequent advanced semantic repair. Multiple levels of information is obtained based on the above modules. This paper uses the last three layer features to jointly extract richer high-level semantic information, and the initial salient map is obtained. After that, the network structure of U-Net is used to fuse from top to the bottom of the network. After upsampling, each layer is fused with the next layer in channel dimension. The first three layers of bottom features are guided by advanced semantic features before and after fusion, to complete the repair of the low-level features. Finally, the final salient map is obtained. The proposed cross-modal feature fusion module can adaptively fuse multi-modal features, highlight the commonness and complementarity of fusion features, and reduce the ambiguity of fusion. The proposed advanced semantic repair strategy is helpful to accurately detect the salient region and improve the edge clarity. Experimental results show that the proposed algorithm outperforms most excellent methods on five datasets, including NJU2K, NLPR, STERE, DES and SIP, which achieves relatively advanced performance.

Key words: RGB-D, saliency target detection, cross-modal fusion, advanced semantic repair

摘要： 针对显著区域定位不完整以及边缘模糊问题，提出一种RGB-D显著性目标检测方法。该方法首先设计了一个跨模态特征融合模块来逐层融合RGB和Depth信息，并得到六个模态融合特征输出。该模块降低了RGB和Depth信息之间存在的差异性，为后续的高级语义修复提供更具共性和互补性的深层特征；基于上述模块获得的多层次信息，利用后三层特征，联合提取更丰富的高级语义信息，并得到初始显著图。之后，采用U-Net的网络结构，从网络的顶层向下融合，每一层经过上采样之后与下一层进行通道维度上的融合，前三层底层特征在融合前后采用高级语义特征进行指导，以完成对底层特征的修复。最后，得到最终的显著图。提出的跨模态特征融合模块能够自适应地融合多模态特征，突出融合特征的共性和互补性，降低融合的模糊度。提出的高级语义修复策略有助于准确检测出显著区域并提高边缘清晰度。实验结果表明，该算法在NJU2K、NLPR、STERE、DES、SIP五个数据集上均超过大部分优秀的方法，达到了较为先进的性能。

关键词: RGB-D, 显著性目标检测, 跨模态融合, 高级语义修复

SHI Yucheng, WU Yun, LONG Huiyun. Cross-Modal Fusion of RGB-D Salient Detection for Advanced Semantic Repair Strategy[J]. Journal of Frontiers of Computer Science and Technology, 2023, 17(1): 140-153.

石玉诚, 吴云, 龙慧云. 高级语义修复策略的跨模态融合RGB-D显著性检测[J]. 计算机科学与探索, 2023, 17(1): 140-153.

References

[1] LI Z Z, ZHAO B J, TANG L B, et al. Ship classification based on convolutional neural networks[J]. The Journal of Engineering, 2019(21): 7343-7346.
[2] SU W, WANG Z. Widening residual refine edge reserved neural network for semantic segmentation[J]. Multimedia Tools and Applications, 2019, 78(13): 18229-18247.
[3] XIONG Z, HONG S G, NING A P, et al. Pedestrian detection with EDGE features of color image and HOG on depth images[J]. Automatic Control and Computer Sciences, 2020, 54: 168-178.
[4] SHAO L, BRADY M. Specific object retrieval based on salient regions[J]. Pattern Recognition, 2006, 39(10): 1932-1948.
[5] GUO C, ZHANG L. A novel multiresolution spatiotemporal saliency detection model and its applications in image and video compression[J]. IEEE Transactions on Image Processing, 2009, 19(1): 185-198.
[6] MAHADEVAN V, VASCONCELOS N. Biologically inspired object tracking using center-surround saliency mechanisms[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2012, 35(3): 541-554.
[7] QU L, HE S, ZHANG J, et al. RGBD salient object detection via deep fusion[J]. IEEE Transactions on Image Processing, 2017, 26(5): 2274-2285.
[8] FAN D P, LIN Z, ZHANG Z, et al. Rethinking RGB-D salient object detection: models, data sets, and large-scale benchmarks[J]. IEEE Transactions on Neural Networks and Learning Systems, 2021, 32(5): 2075-2089.
[9] LIU Z, SHI S, DUAN Q, et al. Salient object detection for RGB-D image by single stream recurrent convolution neural network[J]. Neurocomputing, 2019, 363: 46-57.
[10] DESINGH K, KRISHNA K M, RAJAN D, et al. Depth really matters: improving visual salient region detection with depth[C]//Proceedings of the British Machine Vision Conference, Bristol, Sep 9-13, 2013. Durham: BMVA Press, 2013: 1-11.
[11] GUO J F, REN T W, BEI J. Salient object detection for RGB-D image via saliency evolution[C]//Proceedings of the 2016 IEEE International Conference on Multimedia and Expo, Seattle, Jul 11-15, 2016. Washington: IEEE Computer Society, 2016: 1-6.
[12] WANG N, GONG X. Adaptive fusion for RGB-D salient object detection[J]. IEEE Access, 2019, 7: 55277-55284.
[13] CHEN H, LI Y, SU D. Multi-modal fusion network with multi-scale multi-path and cross-modal interactions for RGB- D salient object detection[J]. Pattern Recognition, 2019, 86: 376-385.
[14] LI G Y, LIU Z, LING H. ICNet: information conversion network for RGB-D based salient object detection[J]. IEEE Transactions on Image Processing, 2020, 29: 4873-4884.
[15] FAN D P, ZHAI Y, BORJI A, et al. BBS-Net: RGB-D salient object detection with a bifurcated backbone strategy network[C]//LNCS 12357:Proceedings of the 16th European Conference on Computer Vision, Glasgow, Aug 23-28, 2020. Cham:Springer, 2020: 275-292.
[16] RONNEBERGER O, FISCHER P, BROX T. U-Net: con-volutional networks for biomedical image segmentation[C]//LNCS 9351: Proceedings of the 18th International Conference on Medical Image Computing and Computer-Assisted Intervention, Munich, Oct 5-9, 2015. Berlin, Heidelberg: Springer, 2015: 234-241.
[17] LI G Y, LIU Z, YE L W, et al. Cross-modal weighting network for RGB-D salient object detection[C]//LNCS 12362: Proceedings of the 16th European Conference on Computer Vision, Glasgow, Aug 23-28, 2020. Cham: Springer, 2020: 665-681.
[18] LI C, CONG R, KWONG S, et al. ASIF-Net: attention steered interweave fusion network for RGB-D salient object detection[J]. IEEE Transactions on Cybernetics, 2020, 51(1): 88-100.
[19] FU K R, FAN D P, JI G P, et al. JL-DCF: joint learning and densely-cooperative fusion framework for RGB-D salient object detection[C]//Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, Jun 13-19, 2020. Piscataway: IEEE, 2020: 3049-3059.
[20] CHEN Q, LIU Z, ZHANG Y, et al. RGB-D salient object detection via 3D convolutional neural networks[J]. arXiv:2101.10241, 2021.
[21] LI G, LIU Z, CHEN M, et al. Hierarchical alternate interaction network for RGB-D salient object detection[J]. IEEE Transactions on Image Processing, 2021, 30: 3528-3542.
[22] JIN W D, XU J, HAN Q, et al. CDNet: complementary depth network for RGB-D salient object detection[J]. IEEE Transac-tions on Image Processing, 2021, 30: 3376-3390.
[23] TAN M X, LE Q V. Efficientnet: rethinking model scaling for convolutional neural networks[C]//Proceedings of the 2019 International Conference on Machine Learning, Long Beach, Jun 9-15, 2019. New York: ACM, 2019: 6105-6114.
[24] WOO S, PARK J, LEE J Y, et al. CBAM: convolutional block attention module[C]//Proceedings of the 15th European Conference on Computer Vision, Munich, Sep 8-14, 2018. Cham: Springer, 2018: 3-19.
[25] PASZKE A, GROSS S, MASSAF, et al. PyTorch: an imperative style, high performance deep learning library[C]//Advances in Neural Information Processing Systems 32, Vancouver, Dec 8-14, 2019: 8026-8037.
[26] KRIZHEVSKY A, SUTSKEVER I, HINTON G E. ImageNet classification with deep convolutional neural networks[C]// Advances in Neural Information Processing Systems 25, Lake Tahoe, Dec 3-6, 2012: 1097-1105.
[27] KINGMA D P, BA J. Adam: a method for stochastic optimization[J]. arXiv:1412.6980, 2014.
[28] JU R, GE L, GENG W, et al. Depth saliency based on anisotropic center-surround difference[C]//Proceedings of the 2014 IEEE International Conference on Image Processing, Paris, Oct 27-30, 2014. Piscataway: IEEE, 2014: 1115-1119.
[29] PENG H W, LI B, XIONG W H, et al. RGBD salient object detection: a benchmark and algorithms[C]//LNCS 8691:Proceedings of the 13th European Conference on Computer Vision, Zurich, Sep 6-12, 2014. Cham: Springer, 2014: 92-109.
[30] NIU Y Z, GENG Y J, LI X Q, et al. Leveraging stereopsis for saliency analysis[C]//Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition, Providence, Jun 16-21, 2012. Washington: IEEE Computer Society, 2012: 454-461.
[31] CHENG Y P, FU H Z, WEI X X, et al. Depth enhanced saliency detection method[C]//Proceedings of 2014 International Conference on Internet Multimedia Computing and Service,Xiamen, Jul 10-12, 2014. New York: ACM, 2014: 23-27.
[32] LI N Y, YE J W, JI Y, et al. Saliency detection on light field[C]//Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, Columbus, Jun 23-28, 2014. Washington: IEEE Computer Society, 2014: 2806-2813.
[33] LI G, ZHU C B. A three-pathway psychobiological framework of salient object detection using stereoscopic technology[C]//Proceedings of the 2017 IEEE International Conference on Computer Vision Workshops, Venice, Oct 22-29, 2017. Washington: IEEE Computer Society, 2017: 3008-3014.
[34] PIAO Y R, JI W, LI J J, et al. Depth-induced multi-scale recurrent attention network for saliency detection[C]//Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, Seoul, Oct 27-Nov 2, 2019. Piscataway: IEEE, 2019: 7253-7262.
[35] ZHAO J X, CAO Y, FAN D P, et al. Contrast prior and fluid pyramid integration for RGB-D salient object detection[C]//Proceedings of the 2019 IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, Jun 16-20,2019. Piscataway: IEEE, 2019: 3927-3936.
[36] ZHANG J, FAN D P, DAI Y C, et al. UC-Net: uncertainty inspired RGB-D saliency detection via conditional variational autoencoders[C]//Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition,Seattle, Jun 13-19, 2020. Piscataway: IEEE, 2020: 8579-8588.
[37] SUN P, ZHANG W H, WANG H Y, et al. Deep RGB-D saliency detection with depth-sensitive attention and automatic multi-modal fusion[J]. arXiv:2103.11832, 2021.
[38] SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition[J]. arXiv:1409. 1556, 2014.
[39] HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, Jun 27-30, 2016. Washington: IEEE Computer Society, 2016: 770-778.