三维坐标注意力路径聚合网络的目标检测算法

doi:10.3778/j.issn.1673-9418.2211102

摘要/Abstract

摘要： 针对YOLO系列算法在实际工业应用中存在对目标预测框定位不够准确，难以适用于对定位要求较高的现实场景的问题，提出了三维坐标注意力路径聚合网络的目标检测算法YOLO-T。首先，采用短连接方式对路径聚合特征金字塔的跨层特征进行融合，保留其浅层语义信息；其次，基于坐标注意力机制提出了三维坐标注意力（TDCA）模型，利用该模型对路径聚合特征金字塔内的特征进行注意力加权（TPA-FPN），保留有用信息和去除冗余信息；然后，改进了标签分配策略中简单最优传输分配（SimOTA）的损失矩阵计算方法，在保证不损失效率的同时增强了性能；最后，利用Depthwise Separable Conv改进了主干特征提取网络中的卷积模块使模型轻量化。实验结果表明：该算法在PASCAL VOC2007+2012数据集上，检测准确率mAP@0.50比YOLOX-S提高了1.3个百分点，mAP@0.50：0.95提高了3.8个百分点；在COCO2017数据集上平均检测精度mAP@0.50：0.95提高了2.4个百分点。

关键词: 目标检测, 三维坐标注意力（TDCA）, 注意力路径聚合特征金字塔（TPA-FPN）, YOLOX-S算法, 改进SimOTA策略

Abstract: In practical industrial applications, YOLO series algorithms are not accurate enough to locate the object prediction boxes, and it is difficult to apply to realistic scenarios with high positioning requirements. The object detection algorithm YOLO-T of the three-dimensional coordinate attention path aggregation network is proposed. Firstly, the shortcut connection method is used to fuse the cross-layer features of the path aggregation feature pyramid to retain its shallow semantic information. Secondly, based on the coordinate attention mechanism, a three-dimensional coordinate attention (TDCA) model is proposed, which is used to pay attention weight to the features in the path aggregation feature pyramid (TPA-FPN (TDCA path aggregation feature pyramid networks)) to retain useful information and remove redundant information. Thirdly, the loss matrix calculation method of SimOTA (simplify optimal transport assignment) in the label allocation strategy is improved, which enhances the performance while ensuring no loss of efficiency. Finally, Depthwise Separable Conv is used to improve the convolution module in the backbone feature extraction network to make the model lightweight. Experimental results show that the detection accuracy mAP@0.50 of the algorithm is 1.3 percentage points higher than that of YOLOX-S on the PASCAL VOC2007+2012 dataset, and the mAP@0.50:0.95 is improved by 3.8 percentage points. The average detection accuracy mAP@0.50:0.95 is improved by 2.4 percentage points on the COCO2017 dataset.

Key words: object detection, three-dimensional coordinate attention (TDCA), TDCA path aggregation feature pyramid networks (TPA-FPN), YOLOX-S algorithm, improved SimOTA strategy

涂小妹, 包晓安, 吴彪, 金瑜婷, 张庆琪. 三维坐标注意力路径聚合网络的目标检测算法[J]. 计算机科学与探索, 2023, 17(12): 2984-2998.

TU Xiaomei, BAO Xiao'an, WU Biao, JIN Yuting, ZHANG Qingqi. Object Detection Algorithm for 3D Coordinate Attention Path Aggregation Network[J]. Journal of Frontiers of Computer Science and Technology, 2023, 17(12): 2984-2998.

参考文献

[1] 董文轩, 梁宏涛, 刘国柱, 等. 深度卷积应用于目标检测算法综述[J]. 计算机科学与探索, 2022, 16(5): 1025-1042.
DONG W X, LIANG H T, LIU G Z, et al. Review of deep convolution applied to target detection algorithms[J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(5): 1025-1042.
[2] 陈科圻, 朱志亮, 邓小明, 等. 多尺度目标检测的深度学习研究综述[J]. 软件学报, 2021, 32(4): 1201-1227.
CHENG K Q, ZHU Z L, DENG X M, et al. Deep learning for multi-scale object detection: a survey[J]. Journal of Software, 2021, 32(4): 1201-1227.
[3] 范丽丽, 赵宏伟, 赵浩宇, 等. 基于深度卷积神经网络的目标检测研究综述[J]. 光学精密工程, 2020, 28(5): 1152-1164.
FAN L L, ZHAO H W, ZHAO H Y, et al. Survey of target detection based on deep convolutional neural networks[J]. Optics and Precision Engineering, 2020, 28(5): 1152-1164.
[4] HE K M, GKIOXARI G, DOLLáR P, et al. Mask R-CNN[C]//Proceedings of the 2017 IEEE International Conference on Computer Vision, Venice, Oct 22-29, 2017. Washington:IEEE Computer Society, 2017: 2980-2988.
[5] CAI Z, VASCONCELOS N. Cascade R-CNN: high quality object detection and instance segmentation[J]. IEEE Transac-tions on Pattern Analysis and Machine Intelligence, 2019, 43(5): 1483-1498.
[6] LIU W, ANGUELOV D, ERHAN D, et al. SSD: single shot multiBox detector[C]//LNCS 9905: Proceedings of the 14th European Conference on Computer Vision, Amsterdam, Oct 11-14, 2016. Cham: Springer, 2016: 21-37.
[7] REDMON J, FARHADI A. YOLOv3: an incremental im-provement[J]. arXiv:1804.02767, 2018.
[8] BOCHKOVSKIY A, WANG C Y, LIAO H Y M. YOLOv4: optimal speed and accuracy of object detection[J]. arXiv:2004.10934, 2020.
[9] ZHU X K, LYU S C, WANG X, et al. TPH-YOLOv5: im-proved YOLOv5 based on transformer prediction head for object detection on drone-captured scenarios[C]//Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, Montreal, Oct 11-18, 2021. Piscataway: IEEE, 2021: 2778-2788.
[10] GE Z, LIU S T, WANG F, et al. YOLOX: exceeding YOLO series in 2021[J]. arXiv:2107.08430, 2021.
[11] 王鹏飞, 黄汉明, 王梦琪. 改进YOLOv5的复杂道路目标检测算法[J]. 计算机工程与应用, 2022, 58(17): 81-92.
WANG P F, HUANG H M, WANG M Q. Complex road target detection algorithm based on improved YOLOv5[J]. Computer Engineering and Applications, 2022, 58(17): 81-92.
[12] 胡皓, 郭放, 刘钊. 改进YOLOX-S模型的施工场景目标检测[J]. 计算机科学与探索, 2023, 17(5): 1089-1101.
HU H, GUO F, LIU Z. Object detection based on improved YOLOX-S model in construction sites[J]. Journal of Frontiers of Computer Science and Technology, 2023, 17(5): 1089-1101.
[13] LIN T Y, DOLLáR P, GIRSHICK R, et al. Feature pyramid networks for object detection[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, Jul 21-26, 2017. Washington: IEEE Computer So-ciety, 2017: 936-944.
[14] LIU S, QI L, QIN H, et al. Path aggregation network for instance segmentation[C]//Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, Jun 18-23, 2018. Washington: IEEE Computer Society, 2018: 8759-8768.
[15] TAN M X, PANG R M, LE Q V. EfficientDet: scalable and efficient object detection[C]//Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, Jun 13-19, 2020. Piscataway: IEEE, 2020: 10778-10787.
[16] MA J L, CHEN B. Dual refinement feature pyramid networks for object detection[J]. arXiv:2012.01733, 2020.
[17] LUO Y H, CAO X, ZHANG J T, et al. CE-FPN: enhancing channel information for object detection[J]. Multimedia Tools and Applications, 2022, 81(21): 30685-30704.
[18] GUO C X, FAN B, ZHANG Q, et al. AugFPN: improving multi-scale feature learning for object detection[C]//Procee-dings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, Jun 13-19, 2020. Piscataway: IEEE, 2020: 12592-12601.
[19] HU L N, LI Y F. Micro-YOLO: exploring efficient methods to compress CNN based object detection model[C]//Procee-dings of the 13th International Conference on Agents and Artificial Intelligence, Feb 4-6, 2021: 151-158.
[20] 邱天衡, 王玲, 王鹏, 等. 基于改进YOLOv5的目标检测算法研究[J]. 计算机工程与应用, 2022, 58(13): 63-73.
QIU T H, WANG L, WANG P, et al. Research on object detection algorithm based on improved YOLOv5[J]. Computer Engineering and Applications, 2022, 58(13): 63-73.
[21] 杨小冈, 高凡, 卢瑞涛, 等. 基于改进YOLOv5的轻量化航空目标检测方法[J]. 信息与控制, 2022, 51(3): 361-368.
YANG X G, GAO F, LU R T, et al. Lightweight aerial object detection method based on improved YOLOv5[J]. Information and Control, 2022, 51(3): 361-368.
[22] HOU Q B, ZHOU D Q, FENG J S. Coordinate attention for efficient mobile network design[C]//Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Re-cognition, Nashville, Jun 20-25, 2021. Piscataway: IEEE, 2021: 13713-13722.
[23] 汪斌斌, 杨贵军, 杨浩, 等. 基于YOLO_X和迁移学习的无人机影像玉米雄穗检测[J]. 农业工程学报, 2022, 38(15): 53-62.
WANG B B, YANG G J, YANG H, et al. UAV images for detecting maize tassel based on YOLO_X and transfer learning[J]. Transactions of the Chinese Society of Agricultural En-gineering, 2022, 38(15): 53-62.
[24] 杨蜀秦, 王帅, 王鹏飞, 等. 改进YOLOX检测单位面积麦穗[J]. 农业工程学报, 2022, 38(15): 143-149.
YANG S Q, WANG S, WANG P F, et al. Detecting wheat ears per unit area using an improved YOLOX[J]. Transac-tions of the Chinese Society of Agricultural Engineering, 2022, 38(15): 143-149.
[25] 王燕妮, 余丽仙. 注意力与多尺度有效融合的SSD目标检测算法[J]. 计算机科学与探索, 2022, 16(2): 438-447.
WANG Y N, YU L X. SSD object detection algorithm with effective fusion of attention and multi-scale[J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(2): 438-447.
[26] HU J, SHEN L, SUN G, et al. Squeeze-and-excitation net-works[C]//Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City,Jun 18-22, 2018. Washington: IEEE Computer Society, 2018: 7132-7141.
[27] WOO S, PARK J, LEE J Y, et al. CBAM: convolutional block attention module[C]//LNCS 11211: Proceedings of the 15th European Conference on Computer Vision, Munich, Sep 8-14, 2018. Cham: Springer, 2018: 3-19.
[28] FU J, LIU J, TIAN H J, et al. Dual attention network for scene segmentation[C]//Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, Jun 16-20, 2019. Piscataway: IEEE, 2019: 3146-3154.
[29] LIU J J, HOU Q B, CHENG M M, et al. Improving convo-lutional networks with self-calibrated convolutions[C]//Pro-ceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, Jun 13-19, 2020. Piscataway: IEEE, 2020: 10093-10102.
[30] HOU Q B, ZHANG L, CHENG M M, et al. Strip pooling: rethinking spatial pooling for scene parsing[C]//Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, Jun 13-19, 2020. Piscataway: IEEE, 2020: 4002-4011.
[31] 周勇, 陈思霖, 赵佳琦, 等. 基于弱语义注意力的遥感图像可解释目标检测[J]. 电子学报, 2021, 49(4): 679-689.
ZHOU Y, CHEN S L, ZHAO J Q, et al. Weakly semantic based attention network for interpretable object detection in remote sensing imagery[J]. Acta Electronica Sinica, 2021, 49(4): 679-689.
[32] 李飞, 胡坤, 张勇, 等. 基于混合域注意力YOLOv4的输送带纵向撕裂多维度检测[J]. 浙江大学学报(工学版), 2022, 56(11): 2156-2167.
LI F, HU K, ZHANG Y, et al. Multi-dimensional detection of longitudinal tearing of conveyor belt based on YOLOv4 of hybrid domain attention[J]. Journal of Zhejiang University (Engineering Science), 2022, 56(11): 2156-2167.
[33] 王玲敏, 段军, 辛立伟. 引入注意力机制的YOLOv5安全帽佩戴检测方法[J]. 计算机工程与应用, 2022, 58(9): 303-312.
WANG L M, DUAN J, XIN L W. YOLOv5 helmet wear detection method with introduction of attention mechanism[J]. Computer Engineering and Applications, 2022, 58(9): 303-312.
[34] ZHOU D Q, HOU Q B, CHEN Y P, et al. Rethinking bottleneck structure for efficient mobile network design[C]//LNCS 12348: Proceedings of the 2020 European Conference on Computer Vision, Glasgow, Aug 23-28, 2020. Cham: Springer, 2020: 680-697.
[35] 张娜, 戚旭磊, 包晓安, 等. 基于优化预测定位的单阶段目标检测算法[J]. 浙江大学学报(工学版), 2022, 56(4): 783-794.
ZHANG N, QI X L, BAO X A, et al. Single-stage object detection algorithm based on optimizing position prediction[J]. Journal of Zhejiang University (Engineering Science), 2022, 56(4): 783-794.
[36] LIN T Y, GOYAL P, GIRSHICK R, et al. Focal loss for dense object detection[C]//Proceedings of the 2017 IEEE Interna-tional Conference on Computer Vision, Venice, Oct 22-29, 2017. Washington: IEEE Computer Society, 2017: 2999-3007.
[37] YI J R, WU P X, METAXAS D N. ASSD: attentive single shot multibox detector[J]. Computer Vision and Image Un-derstanding, 2019, 189: 102827.
[38] TIAN Z, SHEN C H, CHEN H, et al. FCOS: fully convolu-tional one-stage object detection[C]//Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, Seoul, Oct 27-Nov 2, 2019. Piscataway: IEEE, 2019: 9627-9636.
[39] ZHANG S F, CHI C, YAO Y Q, el al. Bridging the gap between anchor-based and anchor-free detection via adaptive training sample selection[C]//Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, Jun 13-19, 2020. Piscataway: IEEE, 2020: 9756-9765.
[40] ZHANG S F, WEN L Y, BIAN X, et al. Single-shot refine-ment neural network for object detection[C]//Proceedings of the 2017 IEEE International Conference on Image Processing, Beijing, Sep 17-20, 2017. Piscataway: IEEE, 2017: 3360-3364.
[41] LI W Q, LIU G Z. A single-shot object detector with feature aggregation and enhancement[C]//Proceedings of the 2019 IEEE International Conference on Image Processing, Taipei,China, Sep 22-25, 2019. Piscataway: IEEE, 2019: 3910-3914.