计算机科学与探索 ›› 2025, Vol. 19 ›› Issue (4): 989-1000.DOI: 10.3778/j.issn.1673-9418.2403082

• 图形·图像 • 上一篇    下一篇

跨模态多层特征融合的遥感影像语义分割

李智杰,程鑫,李昌华,高元,薛靖裕,介军   

  1. 1. 西安建筑科技大学 信息与控制工程学院,西安 710055
    2. 西安建筑科技大学 建筑学院,西安 710055
  • 出版日期:2025-04-01 发布日期:2025-03-28

Cross-Modal Multi-level Feature Fusion for Semantic Segmentation of Remote Sensing Images

LI Zhijie, CHENG Xin, LI Changhua, GAO Yuan, XUE Jingyu, JIE Jun   

  1. 1. School of Information and Control Engineering, Xi??an University of Architecture and Technology, Xi??an 710055, China
    2. School of Architecture, Xi??an University of Architecture and Technology, Xi??an 710055, China
  • Online:2025-04-01 Published:2025-03-28

摘要: 多模态语义分割网络能够利用不同模态中的互补信息来提高分割精度,在地物分类领域具有广泛的应用潜力。然而,现有的多模态遥感影像语义分割模型大多忽略了深度特征的几何形状信息,未将多层特征充分利用就进行融合,导致跨模态特征提取不充分,融合效果不理想。针对这些问题,提出了一种基于多模态特征提取和多层特征融合的遥感影像语义分割模型。通过构建双分支编码器,模型能够分别提取遥感影像的光谱信息和归一化数字表面模型(nDSM)的高程信息,并深入挖掘nDSM的几何形状信息。引入跨层丰富模块细化完善每层特征,从深层到浅层充分利用多层的特征信息。完善后的特征通过注意力特征融合模块,对特征进行差异性互补和交叉融合,以减轻分支结构之间的差异,充分发挥多模态特征的优势,从而提高遥感影像分割精度。在ISPRS Vaihingen和Potsdam数据集上进行实验,mF1分数分别达到了90.88%和93.41%,平均交互比(mIoU)分别达到了83.49%和87.85%,相较于当前主流算法,该算法实现了更准确的遥感影像语义分割。

关键词: 遥感影像, 归一化数字表面模型(nDSM), 语义分割, 特征提取, 特征融合

Abstract: Multimodal semantic segmentation networks can leverage complementary information from different modalities to improve segmentation accuracy. Thus, they are highly promising for land cover classification. However, existing multimodal remote sensing image semantic segmentation models often overlook the geometric shape information of deep features and fail to fully utilize multi-layer features before fusion. This results in insufficient cross-modal feature extraction and suboptimal fusion effects. To address these issues, a remote sensing image semantic segmentation model based on multimodal feature extraction and multi-layer feature fusion is proposed. By constructing a dual-branch encoder, the model can separately extract spectral information from remote sensing images and elevation information from normalized digital surface model (nDSM), and deeply explore the geometric shape information of the nDSM. Furthermore, a cross-layer enrichment module is introduced to refine and enhance each layer??s features, making full use of multi-layer feature information from deep to shallow layers. The refined features are then processed through an attention feature fusion module for differential complementarity and cross-fusion, mitigating the differences between branch structures and fully exploiting the advantages of multimodal features, thereby improving the segmentation accuracy of remote sensing images. Experiments conducted on the ISPRS Vaihingen and Potsdam datasets demonstrate mF1 scores of 90.88% and 93.41%, respectively, and mean intersection over union (mIoU) scores of 83.49% and 87.85%, respectively. Compared with current mainstream algorithms, this model achieves more accurate semantic segmentation of remote sensing images.

Key words: remote sensing images, normalized digital surface model (nDSM), semantic segmentation, feature extraction, feature fusion