Salient Object Detection Based on Coordinate Attention Feature Pyramid

doi:10.3778/j.issn.1673-9418.2111121

Abstract

Abstract: Salient object detection aims to obtain visually salient objects in images and is an important element in the field of computer vision. Compared with traditional manual feature extraction methods, full convolutional neural network-based methods have shown powerful advantages. However, salient object detection still has some problems. In complex scenes, there may be some noises in the background, which can be easily mistaken as salient objects, leading to the degradation of detection performance. In addition, it is difficult to detect the boundary pixels when the salient object contour is complex. To solve these problems, this paper proposes a salient object detection algorithm based on coordinate attention feature pyramid. A feature pyramid network is used to extract features at different levels and a feature refinement module is designed to achieve fusion of feature at different levels. To solve the problem of background misjudgment, the model adopts coordinate attention mechanism to increase the weight of saliency regions and suppress background noise. For boundary complexity problem, a boundary pixel awareness loss is designed and combined with multi-level supervision to help the network pay more attention to the boundary pixels and generate high-quality saliency maps. Experimental results on five common datasets show that the algorithm achieves better detection performance on five evaluation metrics.

Key words: salient object detection, deep learning, coordinate attention, feature pyramid, boundary awareness

摘要： 显著性目标检测旨在获取图像中的视觉显著目标，是计算机视觉领域的重要研究内容。相比传统手工提取特征的方法，基于全卷积神经网络的方法已在这一领域展现出强大优势。然而，显著性目标检测仍然存在一些问题。复杂场景下，背景中可能存在一些易被误判为显著目标的噪声，导致检测性能下降。另外，当显著目标轮廓较为复杂时，边界像素点的检测也变得较为困难。为了解决这些问题，提出一种坐标注意力特征金字塔的显著性目标检测算法。采用基于特征金字塔的网络结构，提取显著目标中不同层次的特征，并设计特征细化模块以实现不同层次特征的有效融合。为解决背景误判问题，采用坐标注意力模块，增大显著性区域权重的同时，抑制背景噪声。对于边界复杂问题，设计边界感知损失函数并结合多层次监督方法，帮助网络更加关注边界像素点，生成边界清晰的高质量显著图。在五个常用显著性目标检测数据集上的实验结果表明，该算法在五种评价指标上均取得较优的检测结果。

关键词: 显著性目标检测, 深度学习, 坐标注意力, 特征金字塔, 边界感知

WANG Jianzhe, WU Qin. Salient Object Detection Based on Coordinate Attention Feature Pyramid[J]. Journal of Frontiers of Computer Science and Technology, 2023, 17(1): 154-165.

王剑哲, 吴秦. 坐标注意力特征金字塔的显著性目标检测算法[J]. 计算机科学与探索, 2023, 17(1): 154-165.

References

[1] FLORES C F, GONZALEZ-GARCIA A, VAN DE WEIJER J, et al. Saliency for fine-grained object recognition in domains with scarce training data[J]. Pattern Recognition, 2019, 94: 62-73.
[2] WEI Y C, FENG J S, LIANG X D, et al. Object region mining with adversarial erasing: a simple classification to semantic segmentation approach[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, Jul 21-26, 2017. Washington: IEEE Computer Society, 2017: 6488-6496.
[3] REN Z, GAO S, CHIA L T, et al. Region-based saliency detection and its application in object recognition[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2013, 24(5): 769-779.
[4] LIANG P, PANG Y, LIAO C, et al. Adaptive objectness for object tracking[J]. IEEE Signal Processing Letters, 2016, 23(7): 949-953.
[5] 史彩娟, 张卫明, 陈厚儒, 等. 基于深度学习的显著性目标检测综述[J]. 计算机科学与探索, 2021, 15(2): 219-232.
SHI C J, ZHANG W M, CHEN H R, et al. Survey of salient object detection based on deep learning[J]. Journal of Frontiers of Computer Science and Technology, 2021, 15(2): 219-232.
[6] LONG J, SHELLHAMER E, DARRELL T. Fully convolutional networks for semantic segmentation[C]//Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition, Boston, Jun 7-12, 2015. Washington: IEEE Computer Society, 2015: 3431-3440.
[7] RONNEBERGER O, FICHER P, BROX T. U-Net: convolutional networks for biomedical image segmentation[C]//LNCS 9351: Proceedings of the 18th International Conference on Medical Image Computing and Computer-Assisted Intervention, Munich, Oct 5-9, 2015. Cham: Springer, 2015: 234-241.
[8] BADRINARAYANAN V, KENDALL A, CIPOLLA R. SegNet: a deep convolutional encoder-decoder architecture for image segmentation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(12): 2481-2495.
[9] ZHANG J, DAI Y C, PORIKLI F, et al. Multi-scale salient object detection with pyramid spatial pooling[C]//Preceedings of the 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, Kuala Lumpur, Dec 12-15, 2017. Piscataway: IEEE, 2017: 1286-1291.
[10] 张守东, 杨明, 胡太. 基于多特征融合的显著性目标检测算法[J]. 计算机科学与探索, 2019, 13(5): 834-845.
ZHANG S D, YANG M, HU T. Salient object algorithm based on multi-feature fusion[J]. Journal of Frontiers of Computer Science and Technology, 2019, 13(5): 834-845.
[11] LIU J J, HOU Q B, CHENG M M, et al. A simple pooling-based design for real-time salient object detection[C]//Proceedings of the 2019 IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, Jun 16-20, 2019. Piscataway: IEEE, 2019: 3917-3926.
[12] HU J, SHEN L, SUN G. Squeeze-and-excitation networks[C]//Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, Jun 18-22, 2018. Washington: IEEE Computer Society, 2018: 7132-7141.
[13] WOO S, PARK J, LEE J Y, et al. CBAM: convolutional block attention module[C]//LNCS 11211: Proceedings of the 15th European Conference on Computer Vision, Munich, Sep 8-14, 2018. Cham: Springer, 2018: 3-19.
[14] ZHAO T, WU X Q. Pyramid feature attention network for saliency detection[C]//Proceedings of the 2019 IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, Jun 16-20, 2019. Piscataway: IEEE, 2019: 3085-3094.
[15] HOU Q B, ZHOU D Q, FENG J S. Coordinate attention for efficient mobile network design[C]//Proceedings of the 2021 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2021: 13713-13722.
[16] ZHOU H J, XIE X H, LAI J H, et al. Interactive two-stream decoder for accurate and fast saliency detection[C]//Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, Jun 13-19, 2020. Piscataway: IEEE, 2020: 9141-9150.
[17] SU J M, LI J, XIA C Q, et al. Selectivity or invariance: boundary-aware salient object detection[C]//Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, Seoul, Oct 27-Nov 2, 2019. Piscataway: IEEE, 2019: 3798-3807.
[18] HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, Jun 27-30, 2016. Washington: IEEE Computer Society, 2016: 770-778.
[19] SHI J P, YAN Q, XU L, et al. Hierarchical image saliency detection on extended CSSD[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2016, 38(4): 717-729.
[20] LI Y, HOU X D, KOCH C, et al. The secrets of salient object segmentation[C]//Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, Columbus, Jun 23-28, 2014. Washington: IEEE Computer Society, 2014: 280-287.
[21] LI G B, YU Y Z. Deep contrast learning for salient object detection[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, Jun 27-30, 2016. Washington: IEEE Computer Society, 2016: 478-487.
[22] WANG L Y, LU H C, WANG Y F, et al. Learning to detect salient objects with image-level supervision[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, Jul 21-26, 2017. Washington: IEEE Computer Society, 2017: 3796-3805.
[23] YANG C, ZHANG L H, LU H C, et al. Saliency detection via graph-based manifold ranking[C]//Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition, Portland, Jun 23-28, 2013. Washington: IEEE Computer Society, 2013: 3166-3173.
[24] EVERINGHAM M, ESLAMI S M A, VAN GOOL L, et al. The pascal visual object classes challenge: a retrospective[J]. International Journal of Computer Vision, 2015, 111(1): 98-136.
[25] DENG J, DONG W, SOCHER R, et al. ImageNet: a large-scale hierarchical image database[C]//Proceedings of the 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Miami, Jun 20-25, 2009. Washington: IEEE Computer Society, 2009: 248-255.
[26] XIAO J X, HAYS J, EHINGER K A, et al. SUN database: large-scale scene recognition from abbey to zoo[C]//Proceedings of the 23rd IEEE Conference on Computer Vision and Pattern Recognition, San Francisco, Jun 13-18, 2010. Washington: IEEE Computer Society, 2010: 3485-3492.
[27] WU Q, WANG J Z, CHAI Z L, et al. Multi-scale feature aggregation and boundary awareness network for salient object detection[J]. Image and Vision Computing, 2022, 122: 104442.
[28] FAN D P, CHENG M M, LIU Y, et al. Structure-measure: a new way to evaluate foreground maps[C]//Proceedings of the 2017 IEEE International Conference on Computer Vision, Venice, Oct 22-29, 2017. Washington: IEEE Computer Society, 2017: 4558-4567.
[29] FAN D P, GONG C G, CAO Y, et al. Enhanced-alignment measure for binary foreground map evaluation[C]//Proceedings of the 27th International Joint Conference on Artificial Intelligence, Stockholm, Jul 13-19, 2018: 698-704.
[30] CHEN S H, TAN X L, WANG B, et al. Reverse attention for salient object detection[C]//LNCS 11213: Proceedings of the 15th European Conference on Computer Vision, Munich, Sep 8-14, 2018. Cham: Springer, 2018: 236-252.
[31] DENG Z J, HU X W, ZHU L, et al. R3net: recurrent residual refinement network for saliency detection[C]//Proceedings of the 27th International Joint Conference on Artificial Intelligence, Stockholm, Jul 13-19, 2018. Amsterdam: Elsevier, 2018: 684-690.
[32] WANG W G, SHEN J B, CHENG M M, et al. An iterative and cooperative top-down and bottom-up inference network for salient object detection[C]//Proceedings of the 2019 IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, Jun 16-20, 2019. Piscataway: IEEE, 2019: 5968-5977.
[33] FENG M Y, LU H C, DING E R. Attentive feedback network for boundary-aware salient object detection[C]//Proceedings of the 2019 IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, Jun 16-20, 2019. Piscataway: IEEE, 2019: 1623-1632.
[34] WU Z, SU L, HUANG Q M. Cascaded partial decoder for fast and accurate salient object detection[C]//Proceedings of the 2019 IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, Jun 16-20, 2019. Piscataway: IEEE, 2019: 3907-3916.
[35] CHEN Z Y, XU Q Q, CONG R M, et al. Global context-aware progressive aggregation network for salient object detection[C]//Proceedings of the 34th AAAI Conference on Artificial Intelligence, the 32nd Innovative Applications of Artificial Intelligence Conference, the 10th AAAI Symposium on Educational Advances in Artificial Intelligence, New York, Feb 7-12, 2020. Menlo Park: AAAI, 2020: 10599-10606.
[36] ZHAO X Q, PANG Y W, ZHANG L H, et al. Suppress and balance: a simple gated network for salient object detection [C]//LNCS 12347: Proceeding of the 16th European Conference on Computer Vision, Glasgow, Aug 23-28, 2020. Cham: Springer, 2020: 35-51.
[37] PANG Y W, ZHAO X Q, ZHANG L H, et al. Multi-scale interactive network for salient object detection[C]//Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, Jun 13-19, 2020. Menlo Park: AAAI, 2020: 9410-9419.