计算机科学与探索 ›› 2022, Vol. 16 ›› Issue (11): 2575-2586.DOI: 10.3778/j.issn.1673-9418.2102001

• 图形图像 • 上一篇    下一篇

注意力与跨尺度融合的SSD目标检测算法

李青援1, 邓赵红1,2,3,+(), 罗晓清1, 顾鑫4, 王士同1   

  1. 1.江南大学 人工智能与计算机学院,江苏 无锡 214122
    2.复旦大学 计算神经科学与类脑智能教育部重点实验室,上海 200433
    3.张江实验室,上海 200120
    4.江苏北方湖光光电有限公司,江苏 无锡 214035
  • 收稿日期:2021-02-01 修回日期:2021-03-18 出版日期:2022-11-01 发布日期:2021-03-25
  • 通讯作者: + E-mail: dengzhaohong@jiangnan.edu.cn
  • 作者简介:李青援(1997—),男,山东潍坊人,硕士研究生,主要研究方向为深度学习。
    邓赵红(1981—),男,安徽蒙城人,教授,CCF高级会员,主要研究方向为不确定性人工智能及其应用。
    罗晓清(1980—),女,江西南昌人,副教授,主要研究方向为图像融合、模式识别、图像处 理等。
    顾鑫(1979—),男,江苏张家港人,博士,高级工程师,主要研究方向为模式识别、人工智能图像处理技术研究与应用。
    王士同(1964—),男,江苏扬州人,教授,博士生导师,主要研究方向为人工智能、模式识别等。
  • 基金资助:
    国家自然科学基金面上项目(61772239);上海市市级重大科技专项(2018SHZDZX01)

SSD Object Detection Algorithm with Attention and Cross-Scale Fusion

LI Qingyuan1, DENG Zhaohong1,2,3,+(), LUO Xiaoqing1, GU Xin4, WANG Shitong1   

  1. 1. School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi, Jiangsu 214122, China
    2. Key Laboratory of Computational Neuroscience and Brain-Like Intelligence, Ministry of Education, Fudan University, Shanghai 200433, China
    3. Zhangjiang Laboratory, Shanghai 200120, China
    4. Jiangsu North Huguang Photoelectric Co., Ltd., Wuxi, Jiangsu 214035, China
  • Received:2021-02-01 Revised:2021-03-18 Online:2022-11-01 Published:2021-03-25
  • About author:LI Qingyuan, born in 1997, M.S. candidate. His research interest is deep learning.
    DENG Zhaohong, born in 1981, professor, senior member of CCF. His research interests include uncertainty artificial intelligence and its applications.
    LUO Xiaoqing, born in 1980, associate professor. Her research interests include image fusion, pattern recognition, image processing, etc.
    GU Xin, born in 1979, Ph.D., senior engineer.His research interests include pattern recognition, artificial intelligence image processing technology and its application.
    WANG Shitong, born in 1964, professor, Ph.D. supervisor. His research interests include artificial intelligence, pattern recognition, etc.
  • Supported by:
    National Natural Science Foundation of China(61772239);Municipal Major Science and Technology Project of Shanghai(2018SHZDZX01)

摘要:

为了进一步提升SSD算法的性能,解决SSD算法在进行多尺度预测时特征图信息不平衡和小目标识别难的问题,设计了即插即用的模块,充分融合不同尺度特征图包含的信息并建模特征图内的重要性关系,来增强特征图的表示能力。首先,设计了一种新颖的特征融合方法来解决跨尺度特征融合存在的信息差异问题。其次,根据池化金字塔的思想设计了一种深度特征提取模块来提取不同感受野的信息,从而提高模型对不同尺寸目标的检测能力。最后,为了进一步优化特征图,突出特征图对当前任务有效的信息,并建立全局像素点之间的长距离关系和各通道之间的重要性关系,提出了一种轻量级的注意力模块。通过上述机制,修改了SSD模型的架构,有效地提升了SSD算法的检测精度和鲁棒性。在PASCAL VOC数据集上设计了丰富的实验,验证了所提方法的有效性。在PASCAL VOC2007测试集上该方法比SSD算法提高了2.9个百分点的平均精确度(mAP),同时还保留了实时检测的能力。

关键词: 目标检测, 特征融合, 注意力机制, 深度学习

Abstract:

In order to further improve the performance of the SSD (single shot multibox detector) algorithm, and solve the problems of unbalanced feature map information and difficulty in small target recognition during multi-scale prediction of the SSD algorithm, in this paper, plug-and-play modules are designed to fully integrate the information contained in feature maps of different scales and model the relationships within feature maps to enhance the representation ability of feature maps. Firstly, a novel feature fusion method is designed to solve the problem of information disparity in cross-scale feature fusion. Secondly, according to the idea of pooling pyramid, a depth feature extraction module is designed to extract the information of different receptive fields, so as to improve the detection ability of the model to object of different sizes. Finally, in order to further optimize the feature map, highlight the effective information of the feature map for the current task, and establish the global long-distance relationship between pixels and the importance relationship between each channel, a lightweight attention module is proposed. Through the above mechanism, the structure of SSD model is modified in this paper, which effectively improves the detection accuracy and robustness of SSD algorithm. Extensive experiments have been conducted on PASCAL VOC datasets to verify the efficiency of the proposed method. On PASCAL VOC2007 test datasets, the proposed method improves 2.9 percentage points mean average precision (mAP) over SSD algorithm, while maintaining the ability of real-time detection.

Key words: object detection, feature fusion, attentional mechanism, deep learning

中图分类号: