计算机科学与探索 ›› 2022, Vol. 16 ›› Issue (11): 2547-2556.DOI: 10.3778/j.issn.1673-9418.2104122

• 人工智能 • 上一篇    下一篇

融合通道与空间注意力的编解码人群计数算法

余鹰+(), 潘诚, 朱慧琳, 钱进, 汤洪   

  1. 华东交通大学 软件学院,南昌 330013
  • 收稿日期:2021-05-07 修回日期:2021-06-23 出版日期:2022-11-01 发布日期:2021-06-24
  • 通讯作者: + E-mail: yuyingjx@163.com
  • 作者简介:余鹰(1979—),女,博士,副教授,硕士生导师,CCF会员,主要研究方向为机器学习、计算机视觉、粒计算等。
    潘诚(1995—),男,硕士研究生,主要研究方向为机器学习、计算机视觉。
    朱慧琳(1996—),女,硕士研究生,主要研究方向为计算机视觉。
    钱进(1975—),男,博士,教授,硕士生导师,CCF会员,主要研究方向为粒计算、大数据挖掘、机器学习。
    汤洪(1998—),男,硕士研究生,主要研究方向为深度学习、计算机视觉。
  • 基金资助:
    国家自然科学基金(62163016);国家自然科学基金(62066014);江西省自然科学基金(20212ACB202001);江西省自然科学基金(20202BABL202018)

Encoder-Decoder Network Fusing Channel and Spatial Attention for Crowd Counting

YU Ying+(), PAN Cheng, ZHU Huilin, QIAN Jin, TANG Hong   

  1. College of Software, East China Jiaotong University, Nanchang 330013, China
  • Received:2021-05-07 Revised:2021-06-23 Online:2022-11-01 Published:2021-06-24
  • About author:YU Ying, born in 1979, Ph.D., associate professor, M.S. supervisor, member of CCF. Her research interests include machine learning, computer vision, granular computing, etc.
    PAN Cheng, born in 1995, M.S. candidate. His research interests include machine learning and computer vision.
    ZHU Huilin, born in 1996, M.S. candidate. Her research interest is computer vision.
    QIAN Jin, born in 1975, Ph.D., professor, M.S. supervisor, member of CCF. His research interests include granular computing, big data mining and machine learning.
    TANG Hong, born in 1998, M.S. candidate. His research interests include deep learning and computer vision.
  • Supported by:
    National Natural Science Foundation of China(62163016);National Natural Science Foundation of China(62066014);Natural Science Foundation of Jiangxi Province(20212ACB202001);Natural Science Foundation of Jiangxi Province(20202BABL202018)

摘要:

人群计数旨在准确地预测现实场景中人群的数量、分布和密度,然而现实场景普遍存在背景复杂、目标尺度多样和人群分布杂乱等问题,给人群计数任务带来极大的挑战。针对这些问题,提出了一种融合通道与空间注意力的编解码结构人群计数网络(CSANet)。该模型采用多层次编解码网络结构提取多尺度语义特征,并充分融合空间上下文信息,以此来解决复杂场景中行人尺度变化和分布杂乱的问题;为了降低复杂背景对计数性能的影响,在特征融合的过程中引入了通道与空间注意力,提高人群区域的特征权重,凸显感兴趣区域,同时降低弱相关背景区域的特征权重,抑制背景噪声干扰,最终提升人群密度图质量。为了验证算法的有效性,在多个经典人群计数数据集上进行了实验,实验结果表明,与现有的人群计数算法相比,CSANet具有良好的多尺度特征提取能力和背景噪声抑制能力,这使得密集场景下计数算法的准确性和鲁棒性均有较大提升。

关键词: 人群计数, 编解码网络, 注意力, 特征融合, 深度学习

Abstract:

The purpose of crowd counting is to accurately predict the number, distribution and density of crowds in real scenes. However, crowd counting often suffers from some problems such as complex background, diverse target scales, and cluttered crowd distribution, which strongly affects the precision of counting. To solve these problems, a channel and spatial attention-based encoder-decoder network for crowd counting (CSANet) is proposed. It uses a multi-level encoder-decoder network to extract multi-scale semantic features, and fully integrates spatial context information to solve the problem of pedestrian scale changes and messy distribution in complex scenes. To reduce the impact of complex background on counting performance, channel and spatial attention are introduced in the process of feature fusion to improve the quality of crowd density map by increasing the feature weights of crowd regions to highlight regions of interest, and decreasing the feature weights of weakly correlated background regions to suppress background noise interference. To verify the effectiveness of the proposed algorithm, experiments are conducted on several classical crowd counting datasets, and the experimental results show that CSANet performs well in multi-scale feature extraction and background noise suppression compared with existing crowd counting algorithms, which greatly improves the accuracy and robustness of counting algorithm in dense scenes.

Key words: crowd counting, encoder-decoder network, attention, feature fusion, deep learning

中图分类号: