Journal of Frontiers of Computer Science and Technology ›› 2021, Vol. 15 ›› Issue (2): 346-353.DOI: 10.3778/j.issn.1673-9418.1912011

• Graphics and Image • Previous Articles     Next Articles

Detection Algorithm of Small Target in Receptive Field Block

CHEN Haoran, PENG Li   

  1. Engineering Research Center of Internet of Things Technology Applications (School of Internet of Things Engineering, Jiangnan University), Ministry of Education, Wuxi, Jiangsu 214122, China
  • Online:2021-02-01 Published:2021-02-01



  1. 物联网技术应用教育部工程研究中心(江南大学 物联网工程学院),江苏 无锡 214122


The one-stage algorithm SSD (single shot multibox detector) proposed earlier will increase the number of computation channels after 3×3 convolution in the feature extraction of the backbone network. At the same time, these extracted features are directly generated feature maps and respectively thrown into the prediction model, thus causing no good connection of information between layers. In the process of detection, the neural network dominates large targets. Small objects are more likely to be missed, which results in a lower detection rate for small objects. Based on SSD, this paper incorporates a receptive field block based on feature fusion. On the backbone network of feature extraction, the feature fusion module is extracted based on the perceptual visual field feature to enhance the detection effect on small targets. The mean average precision of the improved algorithm framework on the public data of VOC is 81.8%, and the mean average precision on the aerial dataset for the small target is 82.8%. At the expense of part of the speed, the precision has large advantage.

Key words: computer vision, feature fusion, receptive field, small target, deep learning


早前提出的one-stage类算法SSD,在主干网络特征提取方面,经过3×3的卷积之后会造成计算通道数增多。同时,在SSD中这些被提取出的特征直接生成特征图并分别丢入预测模型中,导致层与层之间没有很好的信息交融。在现实的检测过程中,因为神经网络的主导为大型目标,常常会忽略相对于大型物体更容易被漏检的小型物体,导致小型被检测物的检测成功率较低。因此基于SSD进行研究,融入了一种基于特征融合的感受野模型Receptive Field Block。在特征提取的主干网络上,基于感受视野特征提取融入特征融合模块,以加强对小目标的检测效果。该改进算法框架在VOC公开数据集上的mAP为81.8%,在自制的针对小目标的航拍数据集上的mAP为82.8%,在牺牲了部分速度的情况下,在精度方面产生了较大的优势。

关键词: 机器视觉, 特征融合, 感受野, 小目标, 深度学习