多层级特征融合的无人机航拍图像目标检测

doi:10.3778/j.issn.1673-9418.2205114

摘要/Abstract

摘要： 针对无人机航拍图像中小目标样本多、可提取特征信息少易受背景干扰的问题，基于YOLOv5提出一种多层特征融合的无人机航拍图像检测算法。首先，增加浅层网络的高分辨率特征图保留充足的目标特征信息，同时加入对应尺度的检测头以此增强对微小目标的检测能力；其次，考虑不同层级特征图所包含的信息对于小目标检测任务贡献不同，设计了多层级特征融合层来整合不同的感受野信息，通过融合不同层级特征图聚合上下文信息，并根据训练目标样本大小自适应生成各层级特征图输出权重来动态优化特征图的表达能力；最后，在预测过程中为了减少在不同任务中需求特征信息的冲突，将解耦检测头替换原本的耦合检测头，可以更佳地完成分类和定位任务。在公开数据集VisDrone上进行实验，该方法的平均均值精度达到了35.5%,较基线方法YOLOv5提高了4.4个百分点，同时与主流的检测方法相比也取得更高的检测精度。结果表明，所提出的方法对于小目标检测任务具有较好的性能。

关键词: 目标检测, 特征融合, 航拍图像, 特征学习

Abstract: Aiming at the problem that there are many small target samples and few feature information in the aerial image of the unmanned aerial vehicle (UAV), which is susceptible to interference of background information, a multi-layer feature fusion UAV aerial image detection algorithm based on YOLOv5 (you only look once version 5) is proposed. Firstly, the high-resolution feature map of the shallow network is used to enrich the feature information of the small target. At the same time, the corresponding scale detection head is added to enhance the detection ability of small targets. Secondly, considering the differences in the contribution of different hierarchical features to small object detection tasks, a multi-level feature fusion layer is designed to integrate different sensory field information, the context information is aggregated by fusing different levels of feature maps, and the output weights of each level feature map are generated adaptively according to the train target sample size to optimize the expression ability of feature maps dynamically. Finally, in order to reduce the conflict of demand characteristic information in different tasks in the forecasting process, the decoupled head is used to replace the original coupled head. Thus, classification and positioning tasks can be better completed. Experimental results on the public dataset VisDrone show that the average mean accuracy of the method reaches 35.5%, which is 4.4 percentage points higher than that of the baseline method YOLOv5, and the detection accuracy is also higher than that of the mainstream detection method. The results show that the proposed method has good performance for small object detection tasks.

Key words: object detection, feature fusion, aerial image, feature learning

徐光达, 毛国君. 多层级特征融合的无人机航拍图像目标检测[J]. 计算机科学与探索, 2023, 17(3): 635-645.

XU Guangda, MAO Guojun. Aerial Image Object Detection of UAV Based on Multi-level Feature Fusion[J]. Journal of Frontiers of Computer Science and Technology, 2023, 17(3): 635-645.

参考文献

[1] 陈朋弟, 黄亮, 夏炎, 等. 基于Mask R-CNN的无人机影像路面交通标志检测与识别[J]. 国土资源遥感, 2020, 32(4): 61-67.
CHEN P D, HUANG L, XIA Y, et al. Detection and recognition of road traffic signs in UAV images based on Mask R-CNN[J]. Remote Sensing for Land & Resources, 2020, 32(4): 61-67.
[2] 朱学岩, 张新伟, 顾梦梦, 等. 基于无人机可见光图像的云杉计数方法[J]. 林业工程学报, 2021, 6(4): 140-146.
ZHU X Y, ZHANG X W, GU M M, et al. Spruce counting method based on UAV visible images[J]. Journal of Forestry Engineering, 2021, 6(4): 140-146.
[3] WATTS A C, AMBROSIA V G, HINKLEY E A. Unmanned aircraft systems in remote sensing and scientific research: classification and considerations of use[J]. Remote Sensing, 2012, 4(6): 1671-1692.
[4] DALAL N, TRIGGS B. Histograms of oriented gradients for human detection[C]//Proceedings of the 2005 International Conference on Computer Vision and Pattern Recognition, San Diego, Jun 20-25, 2005. Washington: IEEE Computer Society, 2005: 886-893.
[5] FELZENSZWALB P, MCALLESTER D, RAMANAN D. A discriminatively trained, multiscale, deformable part model[C]//Proceedings of the 2008 IEEE Conference on Computer Vision and Pattern Recognition, Anchorage, Jun 23-28, 2008. Washington: IEEE Computer Society, 2008: 1-8.
[6] GIRSHICK R. Fast R-CNN[C]//Proceedings of the 2015 IEEE International Conference on Computer Vision, Santiago, Dec 7-13, 2015. Washington: IEEE Computer Society, 2015: 1440-1448.
[7] REN S Q, HE K M, GIRSHICK R B, et al. Faster R-CNN: towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6): 1137-1149.
[8] REDMON J, DIVVALA S, GIRSHICK R, et al. You only look once: unified, real-time object detection[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, Jun 27-30, 2016. Washington: IEEE Computer Society, 2016: 779-788.
[9] LIU W, ANGUELOV D, ERHAN D, et al. SSD: single shot multibox detector[C]//LNCS 9905: Proceedings of the 14th European Conference on Computer Vision, Amsterdam, Oct 11-14, 2016. Cham: Springer, 2016: 21-37.
[10] 王玲敏, 段军, 辛立伟. 引入注意力机制的YOLOv5安全帽佩戴检测方法[J]. 计算机工程与应用, 2022, 58(9): 303-312.
WANG L M, DUAN J, XIN L W. YOLOv5 helmet wear detection method with introduction of attention mechanism[J]. Computer Engineering and Applications, 2022, 58(9): 303-312.
[11] 王浩, 雷印杰, 陈浩楠. 改进YOLOV3实时交通标志检测算法[J]. 计算机工程与应用, 2022, 58(8): 243-248.
WANG H, LEI Y J, CHEN H N. Real time traffic sign detection algorithm based on improved YOLOV3[J]. Computer Engineering and Applications, 2022, 58(8): 243-248.
[12] 郭磊, 王邱龙, 薛伟, 等. 基于改进YOLOv5的小目标检测算法[J]. 电子科技大学学报, 2022, 51(2): 251-258.
GUO L, WANG Q L, XUE W, et al. A small object detection algorithm based on improved YOLOv5[J]. Journal of University of Electronic Science and Technology of China, 2022, 51(2): 251-258.
[13] NAJIBI M, SAMANGOUEI P, CHELLAPPA R, et al. SSH: single stage headless face detector[C]//Proceedings of the 2017 IEEE International Conference on Computer Vision, Venice, Oct 22-29, 2017. Washington: IEEE Computer Society, 2017: 4875-4884.
[14] 徐坚, 谢正光, 李洪均. 特征平衡的无人机航拍图像目标检测算法[J/OL]. 计算机工程与应用(2022-01-25) [2022-03-24]. http://kns.cnki.net/kcms/detail/11.2127.TP.20220125. 1825.028.html.
XU J, XIE Z G, LI H J. Feature-balanced UAV aerial image target detection algorithm[J/OL]. Computer Engineering and Applications (2022-01-25) [2022-03-24]. http://kns.cnki. net/kcms/detail/11.2127.TP.20220125.1825.028.html.
[15] LIN T Y, DOLLAR P, GIRSHICK R, et al. Feature pyramid networks for object detection[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, Jul 21-26, 2017. Washington: IEEE Computer Society, 2017: 936-944.
[16] LIU S, QI L, QIN H, et al. Path aggregation network for instance segmentation[C]//Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, Jun 18-23, 2018. Piscataway: IEEE, 2018: 8759-8768.
[17] LIN T Y, MAIRE M, BELONGIE S, et al. Microsoft COCO: common objects in context[C]//LNCS 8693: Proceedings of the 13th European Conference on Computer Vision, Zurich, Sep 6-12, 2014. Cham: Springer, 2014: 740-755.
[18] SONG G, LIU Y, WANG X. Revisiting the sibling head in object detector[C]//Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, Jun 13-19, 2020. Piscataway: IEEE, 2020: 11563-11572.
[19] WU Y, CHEN Y, YUAN L, et al. Rethinking classification and localization for object detection[C]//Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, Jun 13-19, 2020. Piscataway: IEEE, 2020: 10183-10192.
[20] DU D, ZHU P, WEN L, et al. VisDrone-DET2019: the vision meets drone object detection in image challenge results[C]//Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, Seoul, Oct 27-28, 2019. Piscataway: IEEE, 2019: 213-226.
[21] WANG Y, YANG Y, ZHAO X. Object detection using clustering algorithm adaptive searching regions in aerial images[C]//LNCS 12538: Proceedings of the 16th European Conference on Computer Vision, Glasgow, Aug 23-28, 2020. Cham: Springer, 2020: 651-664.
[22] LI C, YANG T, ZHU S, et al. Density map guided object detection in aerial images[C]//Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, Jun 14-19, 2020. Piscataway: IEEE, 2020: 737-746.
[23] DENG S, LI S, XIE K, et al. A global-local self-adaptive network for drone-view object detection[J]. IEEE Transactions on Image Processing, 2020, 30: 1556-1569.
[24] LIU Y, YANG F, HU P. Small-object detection in UAV-captured images via multi-branch parallel feature pyramid networks[J]. IEEE Access, 2020, 8: 145740-145750.
[25] YANG C, HUANG Z, WANG N. QueryDet: cascaded sparse query for accelerating high-resolution small object detection[J]. arXiv:2103.09136, 2021.
[26] YANG F, FAN H, CHU P, et al. Clustered object detection in aerial images[C]//Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, Seoul, Oct 27-Nov 2, 2019. Piscataway: IEEE, 2019: 8310-8319.
[27] REDMON J, FARHADI A. YOLOv3: an incremental improvement[J]. arXiv:1804.02767, 2018.
[28] CAI Z, VASCONCELOS N. Cascade R-CNN: high quality object detection and instance segmentation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019, 43(5): 1483-1498.