Journal of Frontiers of Computer Science and Technology ›› 2023, Vol. 17 ›› Issue (12): 2984-2998.DOI: 10.3778/j.issn.1673-9418.2211102

• Graphics·Image • Previous Articles     Next Articles

Object Detection Algorithm for 3D Coordinate Attention Path Aggregation Network

TU Xiaomei, BAO Xiao'an, WU Biao, JIN Yuting, ZHANG Qingqi   

  1. 1. School of Civil Engineering and Architecture, Zhejiang Guangsha Vocational and Technical University of Construction, Dongyang, Zhejiang 322100, China
    2. School of Computer Science and Technology (School of Artificial Intelligence), Zhejiang Sci-Tech University, Hangzhou 310018, China
    3. School of Sciences, Zhejiang Sci-Tech University, Hangzhou 310018, China
    4. School of Informatics, Zhejiang Guangsha Vocational and Technical University of Construction, Dongyang, Zhe-jiang 322100, China
    5. The Graduate School of East Asian Studies, Yamaguchi University, Yamaguchi 753-8514, Japan
  • Online:2023-12-01 Published:2023-12-01

三维坐标注意力路径聚合网络的目标检测算法

涂小妹,包晓安,吴彪,金瑜婷,张庆琪   

  1. 1. 浙江广厦建设职业技术大学 建筑工程学院,浙江 东阳 322100
    2. 浙江理工大学 计算机科学与技术学院(人工智能学院),杭州 310018
    3. 浙江理工大学 理学院,杭州 310018
    4. 浙江广厦建设职业技术大学 信息学院,浙江 东阳 322100
    5. 山口大学 东亚研究科,日本 山口 753-8514

Abstract: In practical industrial applications, YOLO series algorithms are not accurate enough to locate the object prediction boxes, and it is difficult to apply to realistic scenarios with high positioning requirements. The object detection algorithm YOLO-T of the three-dimensional coordinate attention path aggregation network is proposed. Firstly, the shortcut connection method is used to fuse the cross-layer features of the path aggregation feature pyramid to retain its shallow semantic information. Secondly, based on the coordinate attention mechanism, a three-dimensional coordinate attention (TDCA) model is proposed, which is used to pay attention weight to the features in the path aggregation feature pyramid (TPA-FPN (TDCA path aggregation feature pyramid networks)) to retain useful information and remove redundant information. Thirdly, the loss matrix calculation method of SimOTA (simplify optimal transport assignment) in the label allocation strategy is improved, which enhances the performance while ensuring no loss of efficiency. Finally, Depthwise Separable Conv is used to improve the convolution module in the backbone feature extraction network to make the model lightweight. Experimental results show that the detection accuracy mAP@0.50 of the algorithm is 1.3 percentage points higher than that of YOLOX-S on the PASCAL VOC2007+2012 dataset, and the mAP@0.50:0.95 is improved by 3.8 percentage points. The average detection accuracy mAP@0.50:0.95 is improved by 2.4 percentage points on the COCO2017 dataset.

Key words: object detection, three-dimensional coordinate attention (TDCA), TDCA path aggregation feature pyramid networks (TPA-FPN), YOLOX-S algorithm, improved SimOTA strategy

摘要: 针对YOLO系列算法在实际工业应用中存在对目标预测框定位不够准确,难以适用于对定位要求较高的现实场景的问题,提出了三维坐标注意力路径聚合网络的目标检测算法YOLO-T。首先,采用短连接方式对路径聚合特征金字塔的跨层特征进行融合,保留其浅层语义信息;其次,基于坐标注意力机制提出了三维坐标注意力(TDCA)模型,利用该模型对路径聚合特征金字塔内的特征进行注意力加权(TPA-FPN),保留有用信息和去除冗余信息;然后,改进了标签分配策略中简单最优传输分配(SimOTA)的损失矩阵计算方法,在保证不损失效率的同时增强了性能;最后,利用Depthwise Separable Conv改进了主干特征提取网络中的卷积模块使模型轻量化。实验结果表明:该算法在PASCAL VOC2007+2012数据集上,检测准确率mAP@0.50比YOLOX-S提高了1.3个百分点,mAP@0.50:0.95提高了3.8个百分点;在COCO2017数据集上平均检测精度mAP@0.50:0.95提高了2.4个百分点。

关键词: 目标检测, 三维坐标注意力(TDCA), 注意力路径聚合特征金字塔(TPA-FPN), YOLOX-S算法, 改进SimOTA策略