Water Body Extraction Method Based on ConvNeXt and Dual Feature Extraction Branch

doi:10.3778/j.issn.1673-9418.2404085

Abstract

Abstract: Due to the combined effects of complex spectral mixtures, blurred boundaries of ground objects, and environmental noise, it is extremely challenging to accurately identify water boundaries from high-resolution remote sensing images. To address this problem, this paper proposes a water body extraction method based on ConvNeXt and dual feature extraction branch (CoNFM-Net) on the basis of PSPNet. In the encoder stage, ConvNeXt is used instead of ResNet50 as the backbone network, which uses inverted bottleneck layer, large kernel and other designs to enhance the feature extraction ability of the network. In the decoder stage, a dual feature extraction branch structure with multi-scale feature fusion and context information enhancement is designed. In order to effectively utilize the multi-level feature map generated by the backbone network, a bidirectional feature fusion module (BiFFM) is designed to solve the problem of scale inconsistency in boundary recognition. Aiming to improve the utilization rate of global information, the deep feature map output by the backbone network is passed through the global context information module (GCIM). At the same time, the deepest feature map of the multi-scale feature fusion branch is spliced with it to enhance the model’s ability to capture the details of the water boundary. Experimental results show that the mean intersection over union and F1-score of this method on LoveDA dataset, GF-2 dataset and Sentinel-2 dataset are 89.64%, 94.32%, 92.60%, 96.16% and 93.72%, 96.73%, respectively. In the same environment, compared with U-Net, DANet, CMTFNet and other semantic segmentation algorithms, the proposed algorithm CoNFM-Net has certain advantages.

Key words: water body extraction, ConvNeXt, high-resolution remote sensing images, feature fusion, dual feature extraction branch

摘要： 由于复杂的光谱混合物、地物边界模糊、环境噪声等因素的共同作用，从高分辨率遥感图像中准确识别水体边界极具挑战性。针对此问题，在PSPNet的基础上提出基于ConvNeXt与双特征提取分支的水体提取方法（CoNFM-Net）。在编码器阶段，以ConvNeXt代替ResNet50作为主干网络，利用逆瓶颈层、大卷积核等设计来增强网络的特征提取能力。在解码器阶段，设计了多尺度特征融合和上下文信息增强的双特征提取分支结构，多尺度特征融合分支为有效利用主干网络产生的多层次特征图，设计了一种双向特征融合模块（BiFFM），以解决边界识别中尺度不一致的问题；上下文信息增强分支为提高全局信息的利用率，将主干网络输出的深层特征图通过全局上下文信息获取模块（GCIM）。同时，将经过多尺度特征融合分支的最深层特征图与其进行拼接，增强模型对水体边界细节的捕捉能力。实验结果表明，该方法在LoveDA数据集、高分二号（GF-2）数据集及Sentinel-2数据集上的平均交并比和F1分数分别为89.64%、94.32%，92.60%、96.16%及93.72%、96.73%，且在同样环境下，与U-Net、DANet、CMTFNet等语义分割算法相比，该算法CoNFM-Net具有一定优势。

关键词: 水体提取, ConvNeXt, 高分辨率遥感影像, 特征融合, 双特征提取分支结构

ZHOU Ke, CHANG Ranran, XU Xizhi, MIAO Ru, ZHANG Guangyu, WANG Jiaqian. Water Body Extraction Method Based on ConvNeXt and Dual Feature Extraction Branch[J]. Journal of Frontiers of Computer Science and Technology, 2025, 19(5): 1264-1279.

周珂, 常然然, 徐西志, 苗茹, 张广雨, 王嘉茜. 基于ConvNeXt与双特征提取分支的水体提取方法[J]. 计算机科学与探索, 2025, 19(5): 1264-1279.

References

[1] LIU H, ZHENG L, JIANG L, et al. Forty-year water body changes in Poyang Lake and the ecological impacts based on Landsat and HJ-1 A/B observations[J]. Journal of Hydrology, 2020, 589: 125161.
[2] CHEN Y, FAN R S, YANG X C, et al. Extraction of urban water bodies from high-resolution remote-sensing imagery using deep learning[J]. Water, 2018, 10(5): 585.
[3] MCFEETERS S K. The use of the normalized difference water index (NDWI) in the delineation of open water features[J]. International Journal of Remote Sensing, 1996, 17(7): 1425-1432.
[4] XU H Q. A study on information extraction of water body with the modified normalized difference water index (MNDWI)[J]. National Remote Sensing Bulletin, 2005(5): 589-595.
[5] ZHU Y, SUN L J, ZHANG C Y. Summary of water body extraction methods based on ZY-3 satellite[J]. IOP Conference Series: Earth and Environmental Science, 2017, 100: 012200.
[6] 段秋亚, 孟令奎, 樊志伟, 等. GF-1卫星影像水体信息提取方法的适用性研究[J]. 国土资源遥感, 2015, 27(4): 79-84.
DUAN Q Y, MENG L K, FAN Z W, et al. Applicability of the water information extraction method based on GF-1 image[J]. Remote Sensing for Land & Resources, 2015, 27(4): 79-84.
[7] 郜燕芳, 李俊明, 刘东伟, 等. 基于随机森林模型的城市不透水面提取研究: 以呼和浩特市为例[J]. 冰川冻土, 2018, 40(4): 828-836.
GAO Y F, LI J M, LIU D W, et al. Research on extraction of urban impervious surface based on random forest model: a case study in Hohhot[J]. Journal of Glaciology and Geocryology, 2018, 40(4): 828-836.
[8] JI L Y, GONG P, WANG J, et al. Construction of the 500-m resolution daily global surface water change database (2001-2016)[J]. Water Resources Research, 2018, 54(12): 10270-10292.
[9] WANG Z B, GAO X, ZHANG Y N, et al. MSLWENet: a novel deep learning network for lake water body extraction of google remote sensing images[J]. Remote Sensing, 2020, 12(24): 4140.
[10] RONNEBERGER O, FISCHER P, BROX T. U-Net: convolutional networks for biomedical image segmentation[C]//Proceedings of the 18th International Conference on Medical Image Computing and Computer-Assisted Intervention. Cham: Springer, 2015: 234-241.
[11] FENG W Q, SUI H G, HUANG W M, et al. Water body extraction from very high-resolution remote sensing imagery using deep U-Net and a superpixel-based conditional random field model[J]. IEEE Geoscience and Remote Sensing Letters, 2019, 16(4): 618-622.
[12] LI J J, WANG C, XU L, et al. Multitemporal water extraction of Dongting Lake and Poyang Lake based on an automatic water extraction and dynamic monitoring framework[J]. Remote Sensing, 2021, 13(5): 865.
[13] ZHAO H S, SHI J P, QI X J, et al. Pyramid scene parsing network[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2017: 6230-6239.
[14] CHEN L C, ZHU Y K, PAPANDREOU G, et al. Encoder-decoder with atrous separable convolution for semantic image segmentation[C]//Proceedings of the 15th European Conference on Computer Vision. Cham: Springer, 2018: 833-851.
[15] CHEN L C, PAPANDREOU G, SCHROFF F, et al. Rethinking atrous convolution for semantic image segmentation[EB/OL]. [2024-02-16]. https://arxiv.org/abs/1706.05587.
[16] WANG Z M, WANG J S, YANG K, et al. Semantic segmentation of high-resolution remote sensing images based on a class feature attention mechanism fused with Deeplabv3+[J]. Computers & Geosciences, 2022, 158: 104969.
[17] 王一中, 胡亚琦, 吴小所, 等. 基于改进Swin Transformer的遥感图像语义分割方法[J]. 计算机工程与应用, 2024, 60(11): 194-203.
WANG Y Z, HU Y Q, WU X S, et al. Semantic segmentation method for remote sensing images based on improved Swin Transformer[J]. Computer Engineering and Applications, 2024, 60(11): 194-203.
[18] ZHAO H S, ZHANG Y, LIU S, et al. PSANet: point-wise spatial attention network for scene parsing[C]//Proceedings of the 15th European Conference on Computer Vision. Cham: Springer, 2018: 270-286.
[19] WANG J D, SUN K, CHENG T H, et al. Deep high-resolution representation learning for visual recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021, 43(10): 3349-3364.
[20] FU J, LIU J, TIAN H J, et al. Dual attention network for scene segmentation[C]//Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2019: 3141-3149.
[21] LIU M, LIU J P, HU H. A novel deep learning network model for extracting lake water bodies from remote sensing images[J]. Applied Sciences, 2024, 14(4): 1344.
[22] DAI X, XIA M, WENG L G, et al. Multiscale location attention network for building and water segmentation of remote sensing image[J]. IEEE Transactions on Geoscience and Remote Sensing, 2023, 61: 5609519.
[23] SONG Y J, RUI X P, LI J J. AEDNet: an attention-based encoder-decoder network for urban water extraction from high spatial resolution remote sensing images[J]. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2023, 17: 1286-1298.
[24] CHEN J, XIA M, WANG D H, et al. Double branch parallel network for segmentation of buildings and waters in remote sensing images[J]. Remote Sensing, 2023, 15(6): 1536.
[25] LI J K, LI G G, XIE T, et al. MST-UNet: a modified Swin Transformer for water bodies?? mapping using Sentinel-2 images[J]. Journal of Applied Remote Sensing, 2023, 17: 026507.
[26] ZHANG Q, HU X, XIAO Y. A novel hybrid model based on CNN and multi-scale transformer for extracting water bodies from high resolution remote sensing images[J]. ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences, 2023, 10: 889-894.
[27] LIU Z, MAO H Z, WU C Y, et al. A ConvNet for the 2020s[C]//Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2022: 11966-11976.
[28] WANG J J, ZHENG Z, MA A, et al. LoveDA: a remote sensing land-cover dataset for domain adaptive semantic segmentation[EB/OL]. [2024-02-16]. https://arxiv.org/abs/2110.08733.
[29] LIU Z, LIN Y T, CAO Y, et al. Swin Transformer: hierarchical vision transformer using shifted windows[C]//Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2021: 9992-10002.
[30] CAO Y, XU J R, LIN S, et al. GCNet: non-local networks meet squeeze-excitation networks and beyond[C]//Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2019: 1971-1980.
[31] XIE S N, GIRSHICK R, DOLLáR P, et al. Aggregated residual transformations for deep neural networks[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2017: 5987-5995.
[32] SANDLER M, HOWARD A, ZHU M L, et al. MobileNetV2: inverted residuals and linear bottlenecks[C]//Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2018: 4510-4520.
[33] LIN T Y, DOLLáR P, GIRSHICK R, et al. Feature pyramid networks for object detection[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2017: 936-944.
[34] WANG Z H, LI J, TAN Z L, et al. Swin-UperNet: a semantic segmentation model for mangroves and spartina alterniflora loisel based on UperNet[J]. Electronics, 2023, 12(5): 1111.
[35] WANG R Z, JIANG H Y, LI Y F. UPerNet with ConvNeXt for semantic segmentation[C]//Proceedings of the 2023 IEEE 3rd International Conference on Electronic Technology, Communication and Information. Piscataway: IEEE, 2023: 764-769.
[36] WU H L, HUANG P, ZHANG M, et al. CMTFNet: CNN and multiscale transformer fusion network for remote-sensing image semantic segmentation[J]. IEEE Transactions on Geoscience and Remote Sensing, 2023, 61: 2004612.
[37] LI R, ZHENG S Y, DUAN C X, et al. Multistage attention ResU-Net for semantic segmentation of fine-resolution remote sensing images[J]. IEEE Geoscience and Remote Sensing Letters, 2021, 19: 8009205.
[38] DANG B, LI Y S. MSResNet: multiscale residual network via self-supervised learning for water-body detection in remote sensing imagery[J]. Remote Sensing, 2021, 13(16): 3122.