计算机科学与探索 ›› 2023, Vol. 17 ›› Issue (12): 2967-2983.DOI: 10.3778/j.issn.1673-9418.2210120

• 图形·图像 • 上一篇    下一篇

基于两阶段计算Transformer的小目标检测

徐守坤,顾佳楠,庄丽华,李宁,石林,刘毅   

  1. 常州大学 计算机与人工智能学院,江苏 常州 213164
  • 出版日期:2023-12-01 发布日期:2023-12-01

Small Object Detection Based on Two-Stage Calculation Transformer

XU Shoukun, GU Jianan, ZHUANG Lihua, LI Ning, SHI Lin, LIU Yi   

  1. College of Computer and Artificial Intelligence, Changzhou University, Changzhou, Jiangsu 213164, China
  • Online:2023-12-01 Published:2023-12-01

摘要: 目前,小目标检测任务虽取得了长足发展,但仍存在诸多问题。如,小目标场景往往因为目标自身信息量过少导致目标特征提取难,容易丢失小目标原本的特征信息使得检测效果不佳。为了解决此问题,提出了一种基于两阶段计算Transformer(TCT)的小目标检测网络。首先,在主干特征提取网络中加入两阶段计算Transformer用于特征增强,在传统单阶段计算Transformer值基础上,使用多个一维空洞卷积层分支以不同的特征融合方式获得全局自注意力特征权重,提高特征表达能力与信息交互能力。其次,提出高效的残差连接模块,改进现有的CSPLayer层中低效的卷积层与激活层,有利于促进信息流的交互,学习更丰富的上下文细节特征。最后,提出特征融合与精炼方法以融合多尺度特征,提升目标特征表征能力。通过在PASCAL VOC2007+2012数据集、COCO2017数据集和TinyPerson数据集上进行多个定量与定性实验发现,相较于YOLOX算法,所提算法在小目标检测上具有更强的目标特征提取能力和更高的检测精度。

关键词: YOLOX, Transformer, 小目标检测, 特征融合与精炼

Abstract: Despite the current small object detection task has achieved significant improvements, it still suffers from some problems. For example, it is a challenge to extract small object features because of little information in the scene of small objects, which may lose the original feature information of small object, resulting in poor detection results. To address this problem, this paper proposes a two-stage calculation Transformer (TCT) based small object detection network. Firstly, a two-stage calculation Transformer is embedded in the backbone feature extraction network for feature enhancement. Based on the traditional Transformer values computation, multiple 1D dilated convolutional layer branches with different feature fusions are utilized to implement global self-attention for the purpose of improving the feature representation and information interaction. Secondly, this paper proposes an effective residual connection module to improve the low-efficiency convolution and activation of the current CSPLayer, which helps to advance the information flow and learn more rich contextual details. Finally, this paper proposes a feature fusion and refinement module for fusing multi-scale features and improving the target feature representation capability. Quantitative and qualitative experiments on PASCAL VOC2007+2012 dataset, COCO2017 dataset and TinyPerson dataset show that the proposed algorithm has better ability of target feature extraction and higher detection accuracy for small target detection, compared with YOLOX.

Key words: YOLOX, Transformer, small object detection, feature fusion and refinement