融合Partial卷积与残差细化的遥感影像建筑物提取算法

doi:10.3778/j.issn.1673-9418.2310073

摘要/Abstract

摘要： 由于高空间分辨率遥感图像中背景与建筑物对象的相似度高，导致网络难以兼顾不同大小的建筑物，建筑边界区域的像素与背景混淆，建筑边界很容易被漏检。为解决上述问题，提出融合Partial卷积与残差细化的遥感影像建筑物提取算法（UUNet）。以U-Net为基线网络，首先，改进编码器。在编码器前端加入两个Conv4×4，在最初扩大感受野，捕捉更多遥感影像特征信息，利用Partial卷积（PConv3×3）构造的PC模块，增强编码器提取多尺度建筑物特征的能力，用Conv2×2进行两倍下采样，减少建筑物特征信息丢失。其次，减少参数量。裁剪U-Net网络解码器三层结构为UUNet网络解码器。最后，增加改进的残差细化模块。在解码器输出端构造裁剪到三层结构的U型残差细化模块，对解码器输出的粗糙建筑物特征图进行进一步提纯，使建筑物边缘信息更加清晰，网络解码器与U型残差细化模块编码器进行跳跃连接，保留最初特征，将SimAM嵌入细化模块中，提高建筑物关注度，优化网络改善边界模糊，提升目标边界提取质量。在Satellite dataset Ⅱ（East Asia）数据集上进行消融实验，UUNet比U-Net的IoUBuilding、IoUBackground、F1、OA和MIoU分别提高2.78个百分点、0.12个百分点、1.91个百分点、0.19个百分点、1.45个百分点，表明UUNet网络优于基线网络；在Satellite dataset Ⅱ （East Asia）数据集和WHU数据集上做对比实验，UUNet相较于现有的主流算法更优，能够显著地提升高分辨率遥感影像中建筑物提取的效果。

关键词: 高分辨率遥感影像, 建筑物提取, 边界平滑, 多尺度特征, U-Net, Partial卷积

Abstract: Due to the high similarity between background and buildings in high spatial resolution remote sensing images, which makes it difficult for the network to take into account buildings of different sizes, the pixels in the building boundary region are confused with the background, and the building boundaries are easily missed. In order to solve the above problems, the building extraction algorithm (UUNet) for remote sensing images fusing partial convolution and residual refinement is proposed. Using U-Net as the baseline network, firstly, this paper improves the encoder. It adds two Conv4×4 at the front end of the encoder to expand the sensing field at the beginning and capture more remote sensing image feature information. It utilizes the PC module constructed by partial convolution (PConv3×3) to enhance the ability of the encoder to extract multi-scale building features, and downsamples twice with Conv2×2 to reduce the loss of building feature information. Secondly, this paper reduces the number of parameters. It crops the three-layer structure of the U-Net network decoder to a UUNet network decoder. Lastly, it adds an improved residual refinement module. It constructs a U-shaped residual refinement module cropped to a three-layer structure at the output of the decoder, to further purify the rough building feature maps output from the decoder, so as to make the edge information of the buildings clearer. Decoder is jump-connected to the encoder of the U-shaped residual refinement module to preserve the initial features, and SimAM is embedded in the refinement module to improve the building focus, optimize the network to improve the boundary blurring, and enhance the quality of target boundary extraction. In the ablation experiment conducted on the Satellite dataset II (East Asia), UUNet shows improvements over U-Net, with IoUBuilding, IoUBackground, F1, OA and mIoU increased by 2.78 percentage points, 0.12 percentage points, 1.91 percentage points, 0.19 percentage points and 1.45 percentage points, respectively, indicating that UUNet outperforms the baseline network. Furthermore, comparative experiments on both Satellite dataset II (East Asia) and WHU dataset demonstrate that UUNet performs better than existing mainstream algorithms, significantly enhancing building extraction in high-resolution remote sensing images.

Key words: high resolution remote sensing imagery, building extraction, boundary smoothing, multiscale features, U-Net, Partial convolution

侯佳兴, 齐向明, 郝明, 张进. 融合Partial卷积与残差细化的遥感影像建筑物提取算法[J]. 计算机科学与探索, 2024, 18(10): 2712-2726.

HOU Jiaxing, QI Xiangming, HAO Ming, ZHANG Jin. Building Extraction Algorithm for Remote Sensing Images by Fusing Partial Convolution and Residual Refinement[J]. Journal of Frontiers of Computer Science and Technology, 2024, 18(10): 2712-2726.

参考文献

[1] SAHA I, MAULIK U, BANDYOPADHYAY S, et al. SVMeFC: SVM ensemble fuzzy clustering for satellite image segmentation[J]. IEEE Geoscience and Lemote Sensing Letters, 2011, 9(1): 52-55.
[2] RONNEBERGER O, FISCHER P, BROX T, et al. U-Net: convolutional networks for biomedical image segmentation[C]//LNCS 9351: Proceedings of the 18th International Conference on Medical Image Computing and Computer-Assisted Intervention, Munich, Oct 5-9, 2015. Cham: Springer, 2015: 234-241.
[3] CHEN L C, ZHU Y K, PAPANDREOU G, et al. Encoder-decoder with atrous separable convolution for semantic image segmentation[C]//LNCS 11211: Proceedings of the 15th Eueopean Conference on Computer Vision, Munich, Sep 8-14, 2018. Cham: Springer, 2018: 833-851.
[4] 吴新辉, 毛政元, 翁谦, 等. 利用基于残差多注意力和ACON激活函数的神经网络提取建筑物[J]. 地球信息科学学报, 2022, 24(4): 792-801.
WU X H, MAO Z Y, WENG Q, et al. A neural network based on residual multi-attention and ACON activation function for extract buildings[J]. Journal of Geo-Information Science, 2022, 24(4): 792-801.
[5] 张卓尔, 潘俊, 舒奇迪. 基于双路细节关注网络的遥感影像建筑物提取[J]. 武汉大学学报(信息科学版), 2024, 49(3): 376-388.
ZHANG Z E, PAN J, SHU Q D. Buiding extraction based on dual-stream detail-concerned network[J]. Geomatics and Information Scienc of Wuhan University, 2024, 49(3): 376-388.
[6] 江宝得, 黄威, 许少芬, 等. 融合分散自适应注意力机制的多尺度遥感影像建筑物实例细化提取[J]. 测绘学报, 2023, 52(9): 1504-1514.
JIANG B D, HUANG W, XU S F, et al. Multi-scale building instance refinement extraction from remote sensing images by fusing with decentralized adaptive attention mechanism[J]. Acta Geodaetica et Cartographica Sinica, 2023, 52(9): 1504-1514.
[7] 徐孝彬, 张好杰, 白建波, 等. 基于改进Unet的分布式光伏建筑物高精度分割方法[J]. 太阳能学报, 2023, 44(11): 82-90.
XU X B, ZHANG H J, BAI J B, et al. High-precision segmentation method of distributed photovoltaic buildings based on improved Unet[J]. Acta Energiae Solaris Sinica, 2023, 44(11): 82-90.
[8] 张云佐, 郭威, 武存宇. 融合CNN和Transformer的遥感图像建筑物快速提取[J]. 光学精密工程, 2023, 31(11): 1700-1709.
ZHANG Y Z, GUO W, WU C Y. Fast extraction of buildings from remote sensing images by fusion of CNN and Transformer[J]. Optics and Precision Engineering, 2023, 31(11): 1700-1709.
[9] 明兴涛, 杨德宏. 基于多模块的遥感影像建筑物提取方法[J]. 激光与光电子学进展, 2024, 61(4): 0428004.
MING X T, YANG D H. Building extraction from remote sensing image based on multi-module[J]. Laser & Optoelectronics Progress, 2024, 61(4): 0428004.
[10] 龙丽红, 朱宇霆, 闫敬文, 等. 新型语义分割D-UNet的建筑物提取[J]. 遥感学报, 2023, 27(11): 2593-2602.
LONG L H，ZHU Y T, YAN J W, et al. New building extraction method based on semantic segmentation[J]. National Remote Sensing Bulletin，2023, 27(11): 2593-2602.
[11] 季顺平, 魏世清. 遥感影像建筑物提取的卷积神经元网络与开源数据集方法[J]. 测绘学报, 2019, 48(4): 448-459.
JI S P, WEI S Q. Building extraction via convolutional neural networks from an open remote sensing building dataset[J]. Acta Geodaetica et Cartographica Sinica, 2019, 48(4): 448-459.
[12] QIN X, ZHANG Z, HUANG C, et al. BASNet: boundary-aware salient object detection[C]//Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, Jun 16-20, 2019. Piscataway: IEEE, 2019: 7471-7481.
[13] YANG L X, ZHANG R Y, LI L D, et al. SimAM: a simple, parameter-free attention module for convolutional neural networks[C]//Proceedings of the 2021 International Conference on Machine Learning, London, Jul 18-24, 2021: 11863-11874.
[14] CHEN J, KAO S, HE H, et al. Run, don’t walk: chasing higher FLOPS for faster neural networks[C]//Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Patern Recognition, Vancouver, Jun 17-24, 2023. Piscataway:IEEE, 2023: 12021-12031.
[15] HU M, LI Y, FANG L, et al. A2-FPN: attention aggregation based feature pyramid network for instance segmentation[C]//Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, Jun 20-25, 2021. Piscataway: IEEE, 2021: 15338-15347.
[16] LIU Y, CHEN H, SHEN C, et al. ABCNet: real-time scene text spotting with adaptive Bezier-curve network[C]//Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, Jun 13-19, 2020. Piscataway: IEEE, 2020: 9806-9815.
[17] FAN T L, WANG G L, LI Y, et al. MA-Net: a multi-scale attention network for liver and tumor segmentation[J]. IEEE Access, 2020, 8: 179656-179665.
[18] BADRINARAYANA V, KENDALL A, CIOPOLL R. SegNet: a deep convolutional encoder-decoder architecture for image segmentation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(12): 2481-2495.
[19] ZHOU L C, ZHANG C, WU M. D-LinkNet: LinkNet with pretrained encoder and dilated convolution for high resolution satellite imagery road extraction[C]//Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, Jun 18-22, 2018. Piscataway: IEEE, 2018: 182-186.