基于强化特征金字塔和聚焦损失的小目标检测

doi:10.3778/j.issn.1673-9418.2403006

摘要/Abstract

摘要： 无人机航拍图像具有目标尺度小和背景复杂等特点，因此直接对这类图像使用通用目标检测方法很难获得理想的识别精度。基于YOLOv8，提出一种强化特征金字塔和聚焦损失的小目标检测模型CFE-YOLO。设计一种跨层级强化特征金字塔网络，以跨层级的方式融合注意力特征图来改进传统特征金字塔结构，通过增加浅层网络的高分辨率特征图和去除深层检测头来适应小目标检测需求。结合Complete-IOU和Focal loss损失函数思想，设计了一个基于面积交并比的聚焦损失函数，进一步提升小目标的检测能力。通过引入深度可分离卷积实现一个轻量化空间金字塔池化层模块，在减少参数量的同时保持模型的检测精度。在VisDrone和Tinyperson两个无人机航拍数据集上进行的大量实验显示，CFE-YOLO较基准模型的mAP0.50分别提高了4.72和5.58个百分点且参数量减少37.74%，同时与其他先进算法对比也取得更高的精度。

关键词: 小目标检测, 航拍图像, 特征金字塔, 损失函数

Abstract: Unmanned aerial vehicle (UAV) aerial images have characteristics such as small target scale and complex backgrounds, making it difficult to achieve satisfactory recognition accuracy using generic object detection methods directly on these types of images. Based on YOLOv8, this paper proposes a small object detection model called CFE-YOLO (cross-level feature-fusion enhanced-YOLO), which incorporates a feature enhancement network and a localized focal loss. Firstly, a cross-level feature-fusion enhanced pyramid network (CFEPN) is designed to improve the traditional feature pyramid structure by fusing attention feature maps. This is achieved by adding high-resolution feature maps from shallow networks and removing deep detection heads to adapt to the requirements of small object detection. Secondly, a focus loss function based on area intersection over union is designed by combining Complete-IOU and Focal loss function ideas. It is used to further improve the detection of small objects. Finally, a lightweight spatial pyramid pooling layer module is implemented by introducing depth-wise separable convolutions, maintaining the detection accuracy of the model while reducing the parameter count. Extensive experiments conducted on the UAV datasets VisDrone and Tinyperson show that CFE-YOLO improves the mAP0.50 by 4.72 and 5.58 percentage points respectively compared with the baseline, while reducing the parameter count by 37.74%. Furthermore, it achieves higher accuracy compared with other advanced algorithms.

Key words: small object detection, aerial images, feature pyramid, loss function

施宇, 王乐, 姚叶鹏, 毛国君. 基于强化特征金字塔和聚焦损失的小目标检测[J]. 计算机科学与探索, 2025, 19(3): 693-702.

SHI Yu, WANG Le, YAO Yepeng, MAO Guojun. Small Object Detection Based on Enhanced Feature Pyramid and Focal-AIoU Loss[J]. Journal of Frontiers of Computer Science and Technology, 2025, 19(3): 693-702.

参考文献

[1] AMMOUR N, ALHICHRI H, BAZI Y, et al. Deep learning approach for car detection in UAV imagery[J]. Remote Sensing, 2017, 9(4): 312.
[2] WANG L, XIANG L R, TANG L, et al. A convolutional neural network-based method for corn stand counting in the field[J]. Sensors, 2021, 21(2): 507.
[3] SAMBOLEK S, IVASIC-KOS M. Automatic person detection in search and rescue operations using deep CNN detectors[J]. IEEE Access, 2021, 9: 37905-37922.
[4] LI C. Video-based object detection in security monitoring system[D]. Waterloo: University of Waterloo, 2022.
[5] 田鹏, 毛力. 改进YOLOv8的道路交通标志目标检测算法[J]. 计算机工程与应用, 2024, 60(8): 202-212.
TIAN P, MAO L. Improved YOLOv8 object detection algorithm for traffic sign target[J]. Computer Engineering and Applications, 2024, 60(8): 202-212.
[6] 付锦燚, 张自嘉, 孙伟, 等. 改进YOLOv8的航拍图像小目标检测算法[J]. 计算机工程与应用, 2024, 60(6): 100-109.
FU J Y, ZHANG Z J, SUN W, et al. Improved YOLOv8 small target detection algorithm in aerial images[J]. Computer Engineering and Applications, 2024, 60(6): 100-109.
[7] 王殿伟, 胡里晨, 房杰, 等. 基于改进Double-Head RCNN的无人机航拍图像小目标检测算法[J]. 北京航空航天大学学报, 2024, 50(7): 2141-2149.
WANG D W, HU L C, FANG J, et al. Small target detection algorithm based on improved Double-Head RCNN for UAV aerial images[J]. Journal of Beijing University of Aeronautics and Astronautics, 2024, 50(7): 2141-2149.
[8] CHENG G, YUAN X, YAO X W, et al. Towards large-scale small object detection: survey and benchmarks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45(11): 13467-13488.
[9] TONG K, WU Y Q, ZHOU F. Recent advances in small object detection based on deep learning: a review[J]. Image and Vision Computing, 2020, 97: 103910.
[10] AKYON F C, ONUR ALTINUC S, TEMIZEL A. Slicing aided hyper inference and fine-tuning for small object detection[C]//Proceedings of the 2022 IEEE International Conference on Image Processing. Piscataway: IEEE, 2022: 966-970.
[11] BAI Y C, ZHANG Y Q, DING M L, et al. SOD-MTGAN: small object detection via multi-task generative adversarial network[C]//Proceedings of the 15th European Conference on Computer Vision. Cham: Springer,2018: 210-226.
[12] REDMON J, FARHADI A. YOLOv3: an incremental improvement[EB/OL]. [2024-01-25]. https://arxiv.org/abs/1804. 02767.
[13] LIN T Y, GOYAL P, GIRSHICK R, et al. Focal loss for dense object detection[C]//Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway: IEEE, 2017: 2999-3007.
[14] LIM J S, ASTRID M, YOON H J, et al. Small object detection using context and attention[C]//Proceedings of the 2021 International Conference on Artificial Intelligence in Information and Communication. Piscataway: IEEE, 2021: 181-186.
[15] ZHU X K, LYU S C, WANG X, et al. TPH-YOLOv5: improved YOLOv5 based on transformer prediction head for object detection on drone-captured scenarios[C]//Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2021: 2778-2788.
[16] LIN T Y, DOLLÁR P, GIRSHICK R, et al. Feature pyramid networks for object detection[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2017: 936-944.
[17] LIU S, QI L, QIN H F, et al. Path aggregation network for instance segmentation[C]//Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2018: 8759-8768.
[18] TAN M X, PANG R M, LE Q V. EfficientDet: scalable and efficient object detection[C]//Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2020: 10778-10787.
[19] YANG G Y, LEI J, ZHU Z K, et al. AFPN: asymptotic feature pyramid network for object detection[C]//Proceedings of the 2023 IEEE International Conference on Systems, Man, and Cybernetics. Piscataway: IEEE, 2023: 2184-2189.
[20] WOO S, PARK J, LEE J Y, et al. CBAM: convolutional block attention module[C]//Proceedings of the 15th European Conference on Computer Vision. Cham: Springer, 2018: 3-19.
[21] ZHENG Z H, WANG P, LIU W, et al. Distance-IoU loss: faster and better learning for bounding box regression[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2020, 34(7): 12993-13000.
[22] ZHANG Y F, REN W Q, ZHANG Z, et al. Focal and efficient IOU loss for accurate bounding box regression[J]. Neurocomputing, 2022, 506: 146-157.
[23] HE K M, ZHANG X Y, REN S Q, et al. Spatial pyramid pooling in deep convolutional networks for visual recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(9): 1904-1916.
[24]LI C, LI L, JIANG H, et al. YOLOv6: a single-stage object detection framework for industrial applications[EB/OL]. [2024-01-25]. https://arxiv.org/abs/2209.02976.
[25] CHEN L C, PAPANDREOU G, KOKKINOS I, et al. DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 40(4): 834-848.
[26] WANG C Y, BOCHKOVSKIY A, LIAO H M. YOLOv7: trainable bag-of-freebies sets new state-of-the-art for real-time object detectors[C]//Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2023: 7464-7475.
[27] CAO Y R, HE Z J, WANG L J, et al. VisDrone-DET2021: the vision meets drone object detection challenge results[C]//Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2021: 2847- 2854.
[28] YU X H, GONG Y Q, JIANG N, et al. Scale match for tiny person detection[C]//Proceedings of the 2020 IEEE Winter Conference on Applications of Computer Vision. Piscataway: IEEE, 2020: 1246-1254.
[29] MEHTA S, RASTEGARI M. MobileViT: light-weight, general- purpose, and mobile-friendly vision transformer[EB/OL]. [2024-01-25]. https://arxiv.org/abs/2110.02178.
[30] GE Z, LIU S T, WANG F, et al. YOLOX: exceeding YOLO series in 2021[EB/OL]. [2024-01-25]. https://arxiv.org/abs/2107.08430.
[31] SUNKARA R, LUO T. No more strided convolutions or pooling: a new CNN building block for low-resolution images and small objects[C]//Proceedings of the 2023 Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Cham: Springer, 2023: 443-459.
[32] REZATOFIGHI H, TSOI N, GWAK J, et al. Generalized intersection over union: a metric and a loss for bounding box regression[C]//Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2019: 658-666.
[33] HE J B, ERFANI S, MA X J, et al. Alpha-IoU: a family of power intersection over union losses for bounding box regression[EB/OL]. [2024-01-25]. https://arxiv.org/abs/2110. 13675.
[34] MA S L, XU Y, MA S L, et al. MPDIoU: a loss for efficient and accurate bounding box regression[EB/OL]. [2024-01-25]. https://arxiv.org/abs/2307.07662.

编辑推荐 0

Metrics

阅读次数

全文

HTML			PDF

最新录用	在线预览	正式出版	最新录用	在线预览	正式出版
0	0	0	30	0	58

	来源	本网站

	次数	88
	比例	100%

摘要

最新录用	在线预览	正式出版

37	0	47

	来源	本网站

	次数	84
	比例	100%