计算机科学与探索 ›› 2025, Vol. 19 ›› Issue (3): 693-702.DOI: 10.3778/j.issn.1673-9418.2403006

• 图形·图像 • 上一篇    下一篇

基于强化特征金字塔和聚焦损失的小目标检测

施宇,王乐,姚叶鹏,毛国君   

  1. 1. 福建理工大学 计算机科学与数学学院,福州 350118
    2. 福建理工大学 福建省大数据挖掘与应用技术重点实验室,福州 350118
    3. 中国科学院 信息工程研究所,北京 100084
    4. 景区交易数据要素化文化和旅游部技术创新中心,福州 350000
  • 出版日期:2025-03-01 发布日期:2025-02-28

Small Object Detection Based on Enhanced Feature Pyramid and Focal-AIoU Loss

SHI Yu, WANG Le, YAO Yepeng, MAO Guojun   

  1. 1. College of Computer Science and Mathematics, Fujian University of Technology, Fuzhou 350118, China
    2. Fujian Provincial Key Laboratory of Big Data Mining and Applications, Fujian University of Technology, Fuzhou 350118, China
    3. Institute of Information Engineering, Chinese Academy of Sciences, Beijing 100084, China
    4. Technology Innovation Center of Factored Transaction Data in Tourist Attractions, Ministry of Culture and Tourism, Fuzhou 350000, China
  • Online:2025-03-01 Published:2025-02-28

摘要: 无人机航拍图像具有目标尺度小和背景复杂等特点,因此直接对这类图像使用通用目标检测方法很难获得理想的识别精度。基于YOLOv8,提出一种强化特征金字塔和聚焦损失的小目标检测模型CFE-YOLO。设计一种跨层级强化特征金字塔网络,以跨层级的方式融合注意力特征图来改进传统特征金字塔结构,通过增加浅层网络的高分辨率特征图和去除深层检测头来适应小目标检测需求。结合Complete-IOU和Focal loss损失函数思想,设计了一个基于面积交并比的聚焦损失函数,进一步提升小目标的检测能力。通过引入深度可分离卷积实现一个轻量化空间金字塔池化层模块,在减少参数量的同时保持模型的检测精度。在VisDrone和Tinyperson两个无人机航拍数据集上进行的大量实验显示,CFE-YOLO较基准模型的mAP0.50分别提高了4.72和5.58个百分点且参数量减少37.74%,同时与其他先进算法对比也取得更高的精度。

关键词: 小目标检测, 航拍图像, 特征金字塔, 损失函数

Abstract: Unmanned aerial vehicle (UAV) aerial images have characteristics such as small target scale and complex backgrounds, making it difficult to achieve satisfactory recognition accuracy using generic object detection methods directly on these types of images. Based on YOLOv8, this paper proposes a small object detection model called CFE-YOLO (cross-level feature-fusion enhanced-YOLO), which incorporates a feature enhancement network and a localized focal loss. Firstly, a cross-level feature-fusion enhanced pyramid network (CFEPN) is designed to improve the traditional feature pyramid structure by fusing attention feature maps. This is achieved by adding high-resolution feature maps from shallow networks and removing deep detection heads to adapt to the requirements of small object detection. Secondly, a focus loss function based on area intersection over union is designed by combining Complete-IOU and Focal loss function ideas. It is used to further improve the detection of small objects. Finally, a lightweight spatial pyramid pooling layer module is implemented by introducing depth-wise separable convolutions, maintaining the detection accuracy of the model while   reducing the parameter count. Extensive experiments conducted on the UAV datasets VisDrone and Tinyperson show that CFE-YOLO improves the mAP0.50 by 4.72 and 5.58 percentage points respectively compared with the baseline, while   reducing the parameter count by 37.74%. Furthermore, it achieves higher accuracy compared with other advanced algorithms.

Key words: small object detection, aerial images, feature pyramid, loss function