计算机科学与探索 ›› 2023, Vol. 17 ›› Issue (3): 635-645.DOI: 10.3778/j.issn.1673-9418.2205114

• 图形·图像 • 上一篇    下一篇

多层级特征融合的无人机航拍图像目标检测

徐光达,毛国君   

  1. 1. 福建工程学院 计算机科学与数学学院,福州 350118
    2. 福建工程学院 福建省大数据挖掘与应用重点实验室,福州 350118
  • 出版日期:2023-03-01 发布日期:2023-03-01

Aerial Image Object Detection of UAV Based on Multi-level Feature Fusion

XU Guangda, MAO Guojun   

  1. 1. College of Computer Science and Mathematics, Fujian University of Technology, Fuzhou 350118, China
    2. Fujian Provincial Key Laboratory of Big Data Mining and Applications, Fujian University of Technology, Fuzhou 350118, China
  • Online:2023-03-01 Published:2023-03-01

摘要: 针对无人机航拍图像中小目标样本多、可提取特征信息少易受背景干扰的问题,基于YOLOv5提出一种多层特征融合的无人机航拍图像检测算法。首先,增加浅层网络的高分辨率特征图保留充足的目标特征信息,同时加入对应尺度的检测头以此增强对微小目标的检测能力;其次,考虑不同层级特征图所包含的信息对于小目标检测任务贡献不同,设计了多层级特征融合层来整合不同的感受野信息,通过融合不同层级特征图聚合上下文信息,并根据训练目标样本大小自适应生成各层级特征图输出权重来动态优化特征图的表达能力;最后,在预测过程中为了减少在不同任务中需求特征信息的冲突,将解耦检测头替换原本的耦合检测头,可以更佳地完成分类和定位任务。在公开数据集VisDrone上进行实验,该方法的平均均值精度达到了35.5%,较基线方法YOLOv5提高了4.4个百分点,同时与主流的检测方法相比也取得更高的检测精度。结果表明,所提出的方法对于小目标检测任务具有较好的性能。

关键词: 目标检测, 特征融合, 航拍图像, 特征学习

Abstract: Aiming at the problem that there are many small target samples and few feature information in the aerial image of the unmanned aerial vehicle (UAV), which is susceptible to interference of background information, a multi-layer feature fusion UAV aerial image detection algorithm based on YOLOv5 (you only look once version 5) is proposed. Firstly, the high-resolution feature map of the shallow network is used to enrich the feature information of the small target. At the same time, the corresponding scale detection head is added to enhance the detection ability of small targets. Secondly, considering the differences in the contribution of different hierarchical features to small object detection tasks, a multi-level feature fusion layer is designed to integrate different sensory field information, the context information is aggregated by fusing different levels of feature maps, and the output weights of each level feature map are generated adaptively according to the train target sample size to optimize the expression ability of feature maps dynamically. Finally, in order to reduce the conflict of demand characteristic information in different tasks in the forecasting process, the decoupled head is used to replace the original coupled head. Thus, classification and positioning tasks can be better completed. Experimental results on the public dataset VisDrone show that the average mean accuracy of the method reaches 35.5%, which is 4.4 percentage points higher than that of the baseline method YOLOv5, and the detection accuracy is also higher than that of the mainstream detection method. The results show that the proposed method has good performance for small object detection tasks.

Key words: object detection, feature fusion, aerial image, feature learning