计算机科学与探索 ›› 2024, Vol. 18 ›› Issue (2): 453-464.DOI: 10.3778/j.issn.1673-9418.2305005

• 图形·图像 • 上一篇    下一篇

引入上下文信息和Attention Gate的GUS-YOLO遥感目标检测算法

张华卫,张文飞,蒋占军,廉敬,吴佰靖   

  1. 兰州交通大学 电子与信息工程学院,兰州 730070
  • 出版日期:2024-02-01 发布日期:2024-02-01

GUS-YOLO Remote Sensing Target Detection Algorithm Introducing Context Information and Attention Gate

ZHANG Huawei, ZHANG Wenfei, JIANG Zhanjun, LIAN Jing, WU Baijing   

  1. College of Electronic and Information Engineering, Lanzhou Jiaotong University, Lanzhou 730070, China
  • Online:2024-02-01 Published:2024-02-01

摘要: 目前基于通用YOLO系列的遥感目标检测算法存在并未充分利用图像的全局上下文信息,在特征融合金字塔部分并未充分考虑缩小融合特征之间的语义鸿沟、抑制冗余信息干扰的缺点。在结合YOLO算法优点的基础上提出GUS-YOLO算法,其拥有一个能够充分利用全局上下文信息的骨干网络Global Backbone。除此之外,该算法在融合特征金字塔自顶向下的结构中引入Attention Gate模块,可以突出必要的特征信息,抑制冗余信息。另外,为Attention Gate模块设计了最佳的网络结构,提出了网络的特征融合结构U-Net。最后,为克服ReLU函数可能导致模型梯度不再更新的问题,该算法将Attention Gate模块的激活函数升级为可学习的SMU激活函数,提高模型鲁棒性。在NWPU VHR-10遥感数据集上,该算法相较于YOLOV7算法取得宽松指标mAP0.50 1.64个百分点和严格指标mAP0.75 9.39个百分点的性能提升。相较于目前主流的七种检测算法,该算法取得较好的检测性能。

关键词: 遥感图像, Global Backbone, Attention Gate, SMU, U-neck

Abstract: At present, there are still some problems in the remote sensing target detection algorithm based on the general YOLO (you only look once) series, such as not making full use of the global context information of the image, not narrowing the semantic gap in the feature fusion pyramid part, and not suppressing the interference of redundant information. On the basis of combining the advantages of YOLO algorithms, this paper proposes GUS-YOLO (network of global context extraction unit and attention gate-based YOLOS) algorithm. It has a backbone network Global Backbone that can make full use of global context information. Other than that, this algorithm introduces the Attention Gate module into the top-down structure of the fused feature pyramid, which can emphasize the necessary feature information and suppress redundant information. Furthermore, this paper designs the best network structure for the Attention Gate module and proposes the feature fusion structure U-Net of proposed network.  Finally, because the ReLU activation function may lead to the problem that the model gradient is no longer updated, the Attention Gate module uses a learnable SMU (smooth maximum unit) activation function, which can improve the robustness of the model. On the NWPU VHR-10 remote sensing dataset, this algorithm achieves 1.64 percentage points and 9.39 percentage points performance improvement on mAP0.50 and mAP0.75 respectively compared with YOLOV7. Compared with the current 7 mainstream detection algorithms, this algorithm achieves better detection performance.

Key words: remote sensing image, Global Backbone, Attention Gate, SMU, U-neck