计算机科学与探索 ›› 2024, Vol. 18 ›› Issue (3): 707-717.DOI: 10.3778/j.issn.1673-9418.2209110

• 图形·图像 • 上一篇    下一篇

MFFNet:多级特征融合图像语义分割网络

王燕,南佩奇   

  1. 兰州理工大学 计算机与通信学院,兰州 730050
  • 出版日期:2024-03-01 发布日期:2024-03-01

MFFNet: Image Semantic Segmentation Network of Multi-level Feature Fusion

WANG Yan, NAN Peiqi   

  1. School of Computer and Communication, Lanzhou University of Technology, Lanzhou 730050, China
  • Online:2024-03-01 Published:2024-03-01

摘要: 在图像语义分割任务中,大多数方法未将不同尺度、不同层次的特征充分利用就直接进行上采样,会造成一些有效信息被当成冗余信息而被摒弃,从而降低对某些细小类别和相似类别分割的准确性和敏感性。为此,提出一个多级特征融合网络(MFFNet)。MFFNet采用编码器-解码器结构,在编码阶段,通过上下文信息提取路径和空间信息提取路径分别获取上下文信息与空间细节信息,增强像素间关联性与边界准确性;解码阶段设计一条多级特征融合路径,利用混合双边融合模块融合上下文信息;利用高低特征融合模块融合深层信息与空间信息;利用全局通道融合模块获取不同通道之间的联系,实现不同尺度信息的全局融合。MFFNet网络在PASCAL VOC 2012和Cityscapes验证集上的平均交互比(MIoU)分别为80.70%和76.33%,取得了较好的分割结果。

关键词: 编码器-解码器, 上下文信息, 空间信息, 特征融合

Abstract: In the task of image semantic segmentation, most methods do not make full use of features of different scales and levels, but directly upsampling, which will cause some effective information to be dismissed as redundant information, thus reducing the accuracy and sensitivity of segmentation of some small categories and similar categories. Therefore, a multi-level feature fusion network (MFFNet) is proposed. MFFNet uses encoder-decoder structure, during the encoding stage, the context information and spatial detail information are obtained through the context information extraction path and spatial information extraction path respectively to enhance the inter-pixel correlation and boundary accuracy. During the decoding stage, a multi-level feature fusion path is designed, and the context information is fused by the mixed bilateral fusion module. Deep information and spatial information are fused by high-low feature fusion module. The global channel-attention fusion module is used to obtain the connections between different channels and realize global fusion of different scale information. The MIoU (mean intersection over union) of MFFNet network on the PASCAL VOC 2012 and Cityscapes validation sets is 80.70% and 76.33%, respectively, achieving better segmentation results.

Key words: encoder-decoder, context information, spatial information, feature fusion