Journal of Frontiers of Computer Science and Technology

• Science Researches •     Next Articles

CARFB: A Plug-and-Play Object Detection Module

YANG Meijun, YAO Ruoxiao, XIE Juanying   

  1. School of Computer Science, Shaanxi Normal University, Xi’an 710119, China

CARFB:即插即用的目标检测模块

杨梅君, 姚若侠, 谢娟英   

  1. 陕西师范大学 计算机科学学院,  西安 710119

Abstract: To overcome the limitations of CA (Coordinate Attention) that may lose the significant features of targets in its average pooling of horizontal and vertical features, and its insufficient learning to small target features using two-dimensional ordinary convolution, the CARFB (Coordinate Attention and Receptive Field Block) is proposed. In this CARFB, the maximal pooling is introduced to enhance the average pooling of CA, so as to retain significant and detailed information of input features in horizontal and vertical directions. The advantage of RFB (Receptive Field Block) possessing different size of receptive fields is used to replace the CA's convolution for the concatenated features of the horizontal and vertical features, so as to extract features of different size of targets simultaneously. CBS (Convolution + Batch Normalization + SiLU) module containing convolution kernel with different sizes and steps is introduced to replace the two-dimensional ordinary convolution of CA, so as to further extract horizontal and vertical features and obtain reweighted output features. CARFB module saves target position information in horizontal and vertical directions, and extracts strong distinguishable features of targets of different sizes through different receptive fields, so as to obtain strong capability for feature learning. To verify the performance of this proposed plug-and-play CARFB module, it is embedded into object detector ObjectBox, resulting in the ObjectBox-CARFB detector. Moreover, it is utilized to replace the RFB module in RFB net, resulting in the CARFB net target detector. Experiments on MS COCO dataset show that the performance of ObjectBox-CARFB model is improved comprehensively, especially for detecting small targets. Experiments on PASCAL VOC and MS COCO data sets demonstrate that CARFB net300 and CARFB net512 are, respectively, superior to original RFB net300 and RFB net512 and other compared peers. The proposed CARFB module has stronger feature learning capability and can achieve better detection effect on different size of targets, especially in the detection of small targets. CARFB module can be embedded into any other convolutional neural network to enhance the performance of the original network. It has stronger feature learning capability, and can store more target information, and can achieve better detection effect on targets with different sizes, particularly for detecting small targets.

Key words: Object detection, RFB, Coordinate attention, Small targets, Deep learning

摘要: 针对坐标注意力CA(Coordinate Attention)在水平和垂直方向特征的平均池化可能丢失目标显著特征,以及使用二维普通卷积对小目标特征学习不足的情况,提出了CARFB(Coordinate Attention and Receptive Field Block)模块。该模块将CA的平均池化修改为平均+最大池化,以保留输入特征在水平和垂直方向的显著和细节信息;利用RFB(Receptive Field Block)具有不同大小感受野的优势,在水平和垂直方向分别使用RFB模块代替CA的融合特征统一卷积,以同时提取不同大小目标的特征;引入包含不同大小卷积核和步长的CBS(Convolution + Batch Normalization + SiLU)模块,替换CA的二维普通卷积,进一步提取水平和垂直方向的特征,得到重新加权的输出特征。CARFB模块在水平和垂直方向保存目标位置信息,利用不同感受野提取不同大小目标的强辨别性特征,从而具有更强的特征学习能力。为了验证提出的即插即用模块CARFB的性能,首先将其嵌入ObjectBox目标检测框架,得到ObjectBox-CARFB模型;其次用CARFB模块替换RFB net中的RFB模块,得到CARFB net目标检测模型。MS COCO数据集的实验测试表明,ObjectBox-CARFB模型的性能全面提升,尤其对小目标的检测性能提升突出;PASCAL VOC和MS COCO数据集的实验结果表明,CARFB net300和CARFB net512的目标检测能力分别优于原始RFB net300和RFB net512模型,并优于其他同系列对比模型。提出的CARFB模块具有更强特征学习能力,对不同尺度目标均能取得较好检测效果,特别是在小目标检测方面,效果提升显著。提出的CARFB模块可以嵌入到任何一个卷积神经网络,能保存更多目标信息,具有更强特征学习能力,提升网络性能,对不同尺度目标均能取得较好检测效果,尤其对小目标的检测效果提升显著。

关键词: 目标检测, RFB, 坐标注意力, 小目标, 深度学习