计算机科学与探索

• 学术研究 •    下一篇

深度检测方法中一种融合大小感受野机制的下采样算法

顾正华,刘嘎琼,邵长斌,于化龙   

  1. 1.江苏科技大学 计算机学院,江苏 镇江 212100
    2.江苏省媒体设计与软件技术重点实验室(江南大学),江苏 无锡 214122

A Downsampling Algorithm with Fusion of Different Receptive Field Sizes in Deep Detection Methods

GU Zhenghua, LIU Gaqiong, Shao Changbin, YU Hualong   

  1. 1.College of Computer, Jiangsu University of Science and Technology, Zhenjiang, Jiangsu 212000, China
    2.Jiangsu Key Laboratory of Media Design and Software Technology (Jiangnan University), Wuxi, Jiangsu 214122, China

摘要: 深度目标检测模型的性能优势主要受益于主干网络的特征表达能力,其中的下采样操作是执行语义集成的关键步骤。然而,现有下采样方法采用的小感受野机制,通常会导致采样特征存在全局性结构信息不足的局面。对此,本文提出了一种即插即用的双支路下采样方法DPDM(Dual Path Down-sampling Method, DPDM)。该方法采用附加大感受野采样支路的方式来改善主干网络对后期检测的支撑效果。具体来说,在保留传统小感受野下采样操作的前提下,DPDM构建了一个兼顾效率的大感受野采样支路,来添加采样特征的结构性信息。该支路借鉴空间转深度操作,实现了常规小卷积核设置下的大感受野采样功能。双支路采样操作增加了采样多样性,但并未考虑两者之间的协同。因此,该方法随后采用通道拼接和逐点卷积技术,将两者进行了融合。以当前性能占据优势的YOLO系列模型为基准,在三个不同模型(YOLOX,YOLOv5,YOLOv6)及多个数据集上的实验对比,验证了该方法在改善模型检测精度上的效用。

关键词: 深度学习, 深度目标检测, 多尺度目标检测, 下采样策略

Abstract: The advantage of deep detection models primarily benefits from the feature representation ability of the backbone network, where down-sampling plays a key role in semantic integration. However, existing down sampling approaches often ignore the global structural information of features, due to the usage of the small receptive field manner. To address this issue, this paper proposes a plug-and-play Dual Path Down-sampling Method (DPDM). It improves the support of backbone network for subsequent detection, through an extra large receptive field branch. To be specific, built on the traditional small receptive field channel, DPDM constructs an efficient large receptive field branch to obtain the structural information of features. Inspired from spatial-to-depth operation, it can achieve the effectiveness of a large receptive field under a conventional convolution kernel setting. The dual-path operation increases diversity of features but doesn’t emphasizes the coordination between both types of features. Therefore, DPDM subsequently uses channel concatenation and point-wise convolution techniques to merge the features of two paths. Taking the advanced YOLO as benchmark, experimental evaluations of three models (YOLOX, YOLOv5, YOLOv6) on different datasets demonstrate the superiority of DPDM across various network architectures.

Key words: deep learning, deep object detection, multi-scale objects detection, down-sampling strategy