计算机科学与探索 ›› 2023, Vol. 17 ›› Issue (4): 922-932.DOI: 10.3778/j.issn.1673-9418.2106110

• 人工智能·模式识别 • 上一篇    下一篇

多层次特征融合网络的语义分割算法

祁欣,袁非牛,史劲亭,王贵黔   

  1. 1. 上海师范大学 信息与机电工程学院,上海 201400
    2. 江西农业大学 职业师范(技术)学院,南昌 330045
    3. 上海师范大学 数理学院,上海 201418
  • 出版日期:2023-04-01 发布日期:2023-04-01

Semantic Segmentation Algorithm of Multi-level Feature Fusion Network

QI Xin, YUAN Feiniu, SHI Jinting, WANG Guiqian   

  1. 1. School of Electronic and Mechatronic Engineering, Shanghai Normal University, Shanghai 201400, China
    2. Vocational School of Teachers (Technology), Jiangxi Agricultural University, Nanchang 330045, China
    3. School of Mathematics and Physics, Shanghai Normal University, Shanghai 201418, China
  • Online:2023-04-01 Published:2023-04-01

摘要: 目标多尺度性质、高层语义信息不足等造成现有算法很难在目标边界取得非常准确的分类精度。为此,提出了一种基于多层次特征融合的语义分割算法。在解码阶段,设计了三个特征提取分支,分别为空间细节分支、语义补充分支和上下文信息分支。空间细节分支采用浅层较高分辨率特征图来生成最终分割图,主要用于保留大量空间细节信息。语义补充分支用于增加更多的高层语义抽象信息。上下文信息分支主要负责提取多尺度全局信息。在语义补充分支中,设计了一种特征融合指导模块(FFGM),建模不同特征图之间像素的对应关系,从而有效地融合不同层次的特征。在空间细节分支中,提出一种自增强特征模块(SEM),对低层次特征进行精调细化,旨在得到清晰的目标边界。在上下文信息分支中,采用金字塔池化模块(PPM)获得多尺度上下文信息,解决目标多尺度性带来的像素错分问题。最后,采用注意力机制融合三个分支提取的特征图,从而强化重要特征,抑制非显著特征。在主流的语义分割数据集PASCAL VOC2012与Cityscapes上,该网络模型获得了81.12%的平均交并比和74.56%的平均交并比,明显优于实验比较算法。

关键词: 多层次特征融合, 上下文信息, 语义分割, 空洞卷积, 注意力机制

Abstract: Existing methods are difficult to achieve accurate results in object boundary regions due to multiple scales of objects, lack of high-level semantic abstraction information. To solve this problem, this paper proposes a semantic segmentation algorithm based on multi-level feature fusion. In decoding stages,  this paper designs three feature extraction branches, including a space detail branch, a semantic supplement branch, and a context information branch. In the space detail branch, this paper uses high-resolution shallow feature maps to directly generate a final segmentation map, which reserves a lot of spatial details. The semantic supplement branch is used to capture high-level semantic abstraction information. The context information branch is responsible for extracting multi-scale information. In the semantic supplement branch, this paper designs a feature fusion guidance module (FFGM) that can model the correspondence between pixels on different feature maps, thus features from different levels can be effectively fused. In the space detail branch, this paper proposes a self-enhancement module (SEM) to refine low-level features for obtaining clear boundary regions. In the context information branch, this paper uses a pyramidal pooling module (PPM) to achieve multi-scale context information for correcting misclassification of boundary pixels caused by multiple scales. Finally, attention mechanism is used to fuse features from the three branches for enhancing important features and suppressing indistinctive ones. Experimental results show that the proposed method obtains mIoU of 81.12% and 74.56% on PASCAL VOC2012 and Cityscapes datasets, respectively, and it obviously outperforms compared methods.

Key words: multi-level feature fusion, context information, semantic segmentation, atrous convolution, attention mechanism