MFFNet：多级特征融合图像语义分割网络

doi:10.3778/j.issn.1673-9418.2209110

摘要/Abstract

摘要： 在图像语义分割任务中，大多数方法未将不同尺度、不同层次的特征充分利用就直接进行上采样，会造成一些有效信息被当成冗余信息而被摒弃，从而降低对某些细小类别和相似类别分割的准确性和敏感性。为此，提出一个多级特征融合网络（MFFNet）。MFFNet采用编码器-解码器结构，在编码阶段，通过上下文信息提取路径和空间信息提取路径分别获取上下文信息与空间细节信息，增强像素间关联性与边界准确性；解码阶段设计一条多级特征融合路径，利用混合双边融合模块融合上下文信息；利用高低特征融合模块融合深层信息与空间信息；利用全局通道融合模块获取不同通道之间的联系，实现不同尺度信息的全局融合。MFFNet网络在PASCAL VOC 2012和Cityscapes验证集上的平均交互比（MIoU）分别为80.70%和76.33%，取得了较好的分割结果。

关键词: 编码器-解码器, 上下文信息, 空间信息, 特征融合

Abstract: In the task of image semantic segmentation, most methods do not make full use of features of different scales and levels, but directly upsampling, which will cause some effective information to be dismissed as redundant information, thus reducing the accuracy and sensitivity of segmentation of some small categories and similar categories. Therefore, a multi-level feature fusion network (MFFNet) is proposed. MFFNet uses encoder-decoder structure, during the encoding stage, the context information and spatial detail information are obtained through the context information extraction path and spatial information extraction path respectively to enhance the inter-pixel correlation and boundary accuracy. During the decoding stage, a multi-level feature fusion path is designed, and the context information is fused by the mixed bilateral fusion module. Deep information and spatial information are fused by high-low feature fusion module. The global channel-attention fusion module is used to obtain the connections between different channels and realize global fusion of different scale information. The MIoU (mean intersection over union) of MFFNet network on the PASCAL VOC 2012 and Cityscapes validation sets is 80.70% and 76.33%, respectively, achieving better segmentation results.

Key words: encoder-decoder, context information, spatial information, feature fusion

王燕, 南佩奇. MFFNet：多级特征融合图像语义分割网络[J]. 计算机科学与探索, 2024, 18(3): 707-717.

WANG Yan, NAN Peiqi. MFFNet: Image Semantic Segmentation Network of Multi-level Feature Fusion[J]. Journal of Frontiers of Computer Science and Technology, 2024, 18(3): 707-717.

参考文献

[1] 官申珂, 林晓, 郑晓妹, 等. 结合超像素分割的多尺度特征融合图像语义分割算法[J]. 图学学报, 2021, 42(3): 406-413.
GUAN S K, LIN X, ZHENG X M, et al. A semantic seg- mentation algorithm using multi-scale feature fusion with combination of superpixel segmentation[J]. Journal of Gra-phics, 2021, 42(3): 406-413.
[2] MONASTERIO-EXPOSITO L, PIZARRO D, MACIAS-GUARASA J. Label augmentation to improve generalization of deep learning semantic segmentation of laparoscopic images[J]. IEEE Access, 2022, 10: 37345-37359.
[3] 熊风光, 张鑫, 韩燮, 等. 改进的遥感图像语义分割研究[J].计算机工程与应用, 2022, 58(8): 185-190.
XIONG F G, ZHANG X, HAN X, et al. Rsearch on im-proved semantic segmentation of remote sensing[J]. Com-puter Engineering and Applications, 2022, 58(8): 185-190.
[4] LONG J, SHELHAMER E, DARRELL T. Fully convolutional networks for semantic segmentation[C]//Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition, Boston, Jun 7-12, 2015. Washington: IEEE Computer Society, 2015: 3431-3440.
[5] YANG Q, KU T, HU K. Efficient attention pyramid network for semantic segmentation[J]. IEEE Access, 2021, 9: 18867-18875.
[6] MEHTA S, RASTEGARI M, CASPI A, et al. ESPNet: effi-cient spatial pyramid of dilated convolutions for semantic segmentation[C]//Proceedings of the 15th European Conference on Computer Vision, Munich, Sep 8-14, 2018. Cham: Springer, 2018: 552-568.
[7] WANG Y, ZHOU Q, LIU J, et al. LEDNet: a lightweight encoder-decoder network for real-time semantic segmenta-tion[C]//Proceedings of the 2019 IEEE International Con-ference on Image Processing, Taipei, China, Sep 22-25, 2019. Piscataway: IEEE, 2019: 1860-1864.
[8] 张汉, 张德祥, 陈鹏, 等. 并行注意力机制在图像语义分割中的应用[J]. 计算机工程与应用, 2022, 58(9): 151-160.
ZHANG H, ZHANG D X, CHEN P, et al. Application of parallel attention mechanism in image semantic segmenta-tion[J]. Computer Engineering and Applications, 2022, 58(9): 151-160.
[9] ZHANG K, LIAO Q, ZHANG J, et al. EFRNet: a light-weight network with efficient feature fusion and refinement for real-time semantic segmentation[C]//Proceedings of the 2021 IEEE International Conference on Multimedia and Expo, Shenzhen, Jul 5-9, 2021. Piscataway: IEEE, 2021: 1-6.
[10] YI S, LI J, LIU X, et al. CCAFFMNet: dual-spectral se-mantic segmentation network with channel-coordinate atten-tion feature fusion module[J]. Neurocomputing, 2022, 482: 236-251.
[11] FU J, LIU J, TIAN H, et al. Dual attention network for scene segmentation[C]//Proceedings of the 2019 IEEE/CVF Con-ference on Computer Vision and Pattern Recognition, Long Beach, Jun 15-20, 2019. Piscataway: IEEE, 2019: 3146-3154.
[12] WU T, TANG S, ZHANG R, et al. CGNet: a light-weight context guided network for semantic segmentation[J]. IEEE Transactions on Image Processing, 2020, 30: 1169-1179.
[13] CHENG B, CHEN L C, WEI Y, et al. SPGNet: semantic prediction guidance for scene parsing[C]//Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, Long Beach, Jun 15-20, 2019. Piscataway: IEEE, 2019: 5218-5228.
[14] ZHAO H, ZHANG Y, LIU S, et al. PSANet: point-wise spa-tial attention network for scene parsing[C]//Proceedings of the 15th European Conference on Computer Vision, Munich, Sep 8-14, 2018. Cham: Springer, 2018: 267-283.
[15] GAO S, ZHANG C, WANG Z, et al. SPMNet: a light-weighted network with separable pyramid module for real-time semantic segmentation[J]. Journal of Experimental & Theoretical Ar-tificial Intelligence, 2022, 34(4): 651-662.
[16] SHAN T, YAN J. SCA-Net: a spatial and channel attention network for medical image segmentation[J]. IEEE Access, 2021, 9: 160926-160937.
[17] PENG C, TIAN T, CHEN C, et al. Bilateral attention decoder: a lightweight decoder for real-time semantic segmentation[J]. Neural Networks, 2021, 137: 188-199.
[18] ZHANG H, DANA K, SHI J, et al. Context encoding for semantic segmentation[C]//Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, Jun 18-23, 2018. Washington: IEEE Computer Society, 2018: 7151-7160.
[19] SUN Q, ZHANG Z, LI P. Second-order encoding networks for semantic segmentation[J]. Neurocomputing, 2021, 445: 50-60.
[20] LOU A, LOEW M. CFPNet: channel-wise feature pyramid for real-time semantic segmentation[C]//Proceedings of the 2021 IEEE International Conference on Image Processing, Anchorage, Sep 19-22, 2021. Piscataway: IEEE, 2021: 1894-1898.
[21] YUAN Y, HUANG L, GUO J, et al. OCNet: object context for semantic segmentation[J]. International Journal of Com-puter Vision, 2021, 129(8): 2375-2398.
[22] YANG M, YU K, ZHANG C, et al. DenseASPP for semantic segmentation in street scenes[C]//Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recogni-tion, Salt Lake City, Jun 18-23, 2018. Washington: IEEE Computer Society, 2018: 3684-3692.
[23] LIU J, XU X, SHI Y, et al. RELAXNet: residual efficient learning and attention expected fusion network for real-time semantic segmentation[J]. Neurocomputing, 2022, 474: 115-127.
[24] SANG H, ZHOU Q, ZHAO Y. PCANet: pyramid convolu-tional attention network for semantic segmentation[J]. Image and Vision Computing, 2020, 103: 103997.
[25] EVERINGHAM M, ESLAMI S M, VAN GOOL L, et al. The PASCAL visual object classes challenge: a retrospective[J]. International Journal of Computer Vision, 2015, 111(1): 98-136.
[26] CORDTS M, OMRAN M, RAMOS S, et al. The Cityscapes dataset for semantic urban scene understanding[C]//Procee-dings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, Jun 27-30, 2016. Wa-shington: IEEE Computer Society, 2016: 3213-3223.
[27] HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition[C]//Proceedings of the 2016 IEEE Con-ference on Computer Vision and Pattern Recognition, Las Vegas, Jun 27-30, 2016. Washington: IEEE Computer Society, 2016: 770-778.
[28] YU C, WANG J, PENG C, et al. BiSeNet: bilateral segmen-tation network for real-time semantic segmentation[C]//Pro-ceedings of the 15th European Conference on Computer Vision, Munich, Sep 8-14, 2018. Cham: Springer, 2018: 325-341.
[29] ROBBINS H, MONRO S. A stochastic approximation method[J]. Annals of Mathematical Statistics, 1951, 22(3): 400-407.
[30] ZHAO H, QI X, SHEN X, et al. ICNet for real-time semantic segmentation on high-resolution images[C]//Proceedings of the 15th European Conference on Computer Vision, Munich, Sep 8-14, 2018. Cham: Springer, 2018: 405-420.
[31] ZHAO H, SHI J, QI X, et al. Pyramid scene parsing network[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, Jul 21-26, 2017. Washington: IEEE Computer Society, 2017: 2881-2890.
[32] CHEN L C, PAPANDREOU G, SCHROFF F, et al. Rethinking atrous convolution for semantic image segmentation[J]. arXiv:1706.05587, 2017.