Journal of Frontiers of Computer Science and Technology ›› 2025, Vol. 19 ›› Issue (1): 187-195.DOI: 10.3778/j.issn.1673-9418.2311071

• Graphics·Image • Previous Articles     Next Articles

Semantic Segmentation Algorithm for High Resolution Remote Sensing Images with Dual Encoder

WU Mengke, GAO Xindan   

  1. School of Computer and Control Engineering, Northeast Forestry University, Harbin 150000, China
  • Online:2025-01-01 Published:2024-12-31

双编码端高分辨率遥感影像语义分割算法

吴梦可,高心丹   

  1. 东北林业大学 计算机与控制工程学院,哈尔滨 150000

Abstract: Remote sensing images have the characteristics of multi-scale objects, complex backgrounds, and imbalanced categories. The semantic segmentation algorithm based on convolutional neural network (CNN) is difficult to capture the global characteristics of the image, resulting in poor segmentation results. In response to the above problems, using the global feature extraction capability of Swin Transformer, a dual encoder high-resolution remote sensing image semantic segmentation algorithm DEGFNet (dual encoders and global local transformer feature refinement network) is proposed. Firstly, the feature fusion block (FFB) is designed to introduce the global features captured by Swin Transformer into the encoder to address the challenges of multi-scale objects. At the same time, the spatial interaction block (SIB) is designed in Swin Transformer to reduce the negative impact of complex background samples. Secondly, the global local transformer block (GLTB) and the feature refinement block (FRB) are introduced on the decoder to better utilize the information extracted from the encoder and improve the accuracy of semantic segmentation. Finally, a hybrid loss function composed of cross entropy loss and Dice Loss is used to train the model to reduce the negative impact caused by the imbalance of sample categories. On the Vaihingen dataset, the macro-F1 (mF1), mean intersection over union (mIoU) and overall accuracy (OA) metrics reach 91.9%, 84.8% and 92.4%, respectively, and on the LoveDA dataset, the mIoU metric reaches 55.0%, both showing better semantic segmentation effects and good generalization.

Key words: high-resolution remote sensing images, semantic segmentation, convolutional neural network, Swin Transformer

摘要: 遥感影像具有多尺度对象、复杂背景、不均衡类别等特点。基于卷积神经网络(CNN)的语义分割算法难以捕捉到影像的全局特征,导致分割效果不佳。针对以上问题,利用Swin Transformer的全局特征提取能力,提出了双编码端高分辨率遥感影像语义分割算法DEGFNet。设计特征融合模块(FFB)将Swin Transformer捕获的全局特征引入编码端,应对多尺度对象带来的挑战。在Swin Transformer中设计空间交互模块(SIB),降低复杂背景样本带来的负面影响;在解码端引入全局-局部注意力模块(GLTB)和特征细化模块(FRB),来更好地利用编码端提取的信息,提高语义分割的精确性;采用交叉熵损失和Dice Loss组成的混合损失函数训练模型,减轻样本类别不均衡带来的消极影响。在Vaihingen数据集上,宏观平均F1值(mF1)、平均交并比(mIoU)和整体准确率(OA)指标分别达到91.9%、84.8%和92.4%;在LoveDA数据集上,mIoU指标达到55.0%,均展现出了更好的语义分割效果和良好的泛化性。

关键词: 高分辨率遥感影像, 语义分割, 卷积神经网络, Swin Transformer