计算机科学与探索 ›› 2022, Vol. 16 ›› Issue (1): 217-230.DOI: 10.3778/j.issn.1673-9418.2007059

• 图形图像 • 上一篇    下一篇

协同级联网络和对抗网络的目标检测

李志欣1,+(), 陈圣嘉1, 周韬1, 马慧芳2   

  1. 1.广西师范大学 广西多源信息挖掘与安全重点实验室,广西 桂林 541004
    2.西北师范大学 计算机科学与工程学院,兰州 730070
  • 收稿日期:2020-07-03 修回日期:2020-09-09 出版日期:2022-01-01 发布日期:2020-09-25
  • 通讯作者: + E-mail: lizx@gxnu.edu.cn
  • 作者简介:李志欣(1971—),男,博士,教授,博士生导师,CCF高级会员,主要研究方向为图像理解、机器学习、自然语言处理、跨媒体计算。
    陈圣嘉(1994—),男,硕士,主要研究方向为机器学习、图像理解。
    周韬(1993—),男,硕士,主要研究方向为机器学习、图像理解。
    马慧芳(1981—),女,博士,教授,硕士生导师,CCF会员,主要研究方向为数据挖掘、自然语言处理。
  • 基金资助:
    国家自然科学基金(61966004);国家自然科学基金(61663004);国家自然科学基金(61762078);国家自然科学基金(61866004);广西自然科学基金(2019GXNSFDA245018);广西自然科学基金(2018GXNSFDA281009);广西自然科学基金(2017GXNSFAA198365);广西研究生教育创新计划项目(YCSW2020111)

Combining Cascaded Network and Adversarial Network for Object Detection

LI Zhixin1,+(), CHEN Shengjia1, ZHOU Tao1, MA Huifang2   

  1. 1. Guangxi Key Lab of Multi-source Information Mining and Security, Guangxi Normal University, Guilin, Guangxi 541004, China
    2. College of Computer Science and Engineering, Northwest Normal University, Lanzhou 730070, China
  • Received:2020-07-03 Revised:2020-09-09 Online:2022-01-01 Published:2020-09-25
  • About author:LI Zhixin, born in 1971, Ph.D., professor, Ph.D. supervisor, senior member of CCF. His research interests include image understanding, machine learning, natural language processing and cross-media computing.
    CHEN Shengjia, born in 1994, M.S. His research interests include machine learning and image understanding.
    ZHOU Tao, born in 1993, M.S. His research interests include machine learning and image understanding.
    MA Huifang, born in 1981, Ph.D., professor, M.S. supervisor, member of CCF. Her research interests include data mining and natural language processing.
  • Supported by:
    National Natural Science Foundation of China(61966004);National Natural Science Foundation of China(61663004);National Natural Science Foundation of China(61762078);National Natural Science Foundation of China(61866004);Natural Science Foundation of Guangxi(2019GXNSFDA245018);Natural Science Foundation of Guangxi(2018GXNSFDA281009);Natural Science Foundation of Guangxi(2017GXNSFAA198365);Innovation Project of Guangxi Graduate Education(YCSW2020111)

摘要:

识别多尺度目标和遮挡目标是目标检测中的重点和难点。为了检测不同大小的目标,目标检测器通常利用卷积神经网络(CNN)的多尺度特征图层次结构,然而这种自顶向下的结构由于底层特征图的卷积层较小,缺乏获取小目标特征所需的细节信息,这些目标检测器的性能受到了限制。为此,结合Faster R-CNN框架提出Collaborative R-CNN,设计了一种级联网络结构,可以融合多尺度特征图,以生成深度融合的特征信息来增强小目标所需的细节特征,从而提高检测小目标的能力。此外,由于使用RoIPooling过程中的量化会对小目标检测造成极大的限制,为进一步提高方法的鲁棒性,设计了多尺度RoIAlign来消除这种量化,并通过多尺度的池化来提高网络检测不同尺度目标的能力。最后,将对抗网络与所提出的级联网络相结合,生成包含遮挡目标的训练样本,可显著提高模型的分类能力和识别遮挡目标的鲁棒性。在PASCAL VOC 2012和PASCAL VOC 2007数据集上的实验结果表明,提出的方法优于许多先进的方法。

关键词: 目标检测, 卷积神经网络(CNN), 特征融合, 级联网络, 对抗网络

Abstract:

Recognizing multi-scale objects and objects with occlusions is a key and difficult point of task in object detection. In order to detect objects with different sizes, the object detector usually uses the hierarchical structure of multi-scale feature map constructed by convolutional neural network (CNN). However, due to the small convolution layer of the bottom feature map, the top-down structure lacks the detailed information needed to capture the features of small object. The performance of these object detectors is limited. Therefore, based on the Faster R-CNN (region-convolutional neural network) framework, this paper proposes Collaborative R-CNN. This paper designs a cascaded network structure that integrates multi-scale feature maps to generate deeply fused feature information and thereby improving the ability to detect small objects. Moreover, the quantization in the RoIPooling process greatly limits the recognition ability of small objects. In order to further improve the robustness of the method, a multi-scale RoIAlign is designed to eliminate such quantization, and the ability of network to detect objects with different scales is improved by multi-scale pooling. Finally, this paper combines an adversarial network with the proposed network to generate training samples with occlusions, significantly improving the classification ability of the model, and robustness to detect occlusions. Experimental results for the PASCAL VOC 2012 and PASCAL VOC 2007 datasets demonstrate the superiority of proposed approach relative to several state-of-the-art approaches.

Key words: object detection, convolutional neural network (CNN), feature fusion, cascaded network, adversarial network

中图分类号: