融合通道与空间注意力的编解码人群计数算法

doi:10.3778/j.issn.1673-9418.2104122

计算机科学与探索 ›› 2022, Vol. 16 ›› Issue (11): 2547-2556.DOI: 10.3778/j.issn.1673-9418.2104122

融合通道与空间注意力的编解码人群计数算法

余鹰⁺(), 潘诚, 朱慧琳, 钱进, 汤洪

华东交通大学软件学院，南昌 330013

收稿日期:2021-05-07 修回日期:2021-06-23 出版日期:2022-11-01 发布日期:2021-06-24
通讯作者: + E-mail: yuyingjx@163.com
作者简介:余鹰（1979—），女，博士，副教授，硕士生导师，CCF会员，主要研究方向为机器学习、计算机视觉、粒计算等。
潘诚（1995—），男，硕士研究生，主要研究方向为机器学习、计算机视觉。
朱慧琳（1996—），女，硕士研究生，主要研究方向为计算机视觉。
钱进（1975—），男，博士，教授，硕士生导师，CCF会员，主要研究方向为粒计算、大数据挖掘、机器学习。
汤洪（1998—），男，硕士研究生，主要研究方向为深度学习、计算机视觉。
基金资助:
国家自然科学基金(62163016);国家自然科学基金(62066014);江西省自然科学基金(20212ACB202001);江西省自然科学基金(20202BABL202018)

Encoder-Decoder Network Fusing Channel and Spatial Attention for Crowd Counting

YU Ying⁺(), PAN Cheng, ZHU Huilin, QIAN Jin, TANG Hong

College of Software, East China Jiaotong University, Nanchang 330013, China

Received:2021-05-07 Revised:2021-06-23 Online:2022-11-01 Published:2021-06-24
About author:YU Ying, born in 1979, Ph.D., associate professor, M.S. supervisor, member of CCF. Her research interests include machine learning, computer vision, granular computing, etc.
PAN Cheng, born in 1995, M.S. candidate. His research interests include machine learning and computer vision.
ZHU Huilin, born in 1996, M.S. candidate. Her research interest is computer vision.
QIAN Jin, born in 1975, Ph.D., professor, M.S. supervisor, member of CCF. His research interests include granular computing, big data mining and machine learning.
TANG Hong, born in 1998, M.S. candidate. His research interests include deep learning and computer vision.
Supported by:
National Natural Science Foundation of China(62163016);National Natural Science Foundation of China(62066014);Natural Science Foundation of Jiangxi Province(20212ACB202001);Natural Science Foundation of Jiangxi Province(20202BABL202018)

摘要/Abstract

摘要：

人群计数旨在准确地预测现实场景中人群的数量、分布和密度，然而现实场景普遍存在背景复杂、目标尺度多样和人群分布杂乱等问题，给人群计数任务带来极大的挑战。针对这些问题，提出了一种融合通道与空间注意力的编解码结构人群计数网络（CSANet）。该模型采用多层次编解码网络结构提取多尺度语义特征，并充分融合空间上下文信息，以此来解决复杂场景中行人尺度变化和分布杂乱的问题；为了降低复杂背景对计数性能的影响，在特征融合的过程中引入了通道与空间注意力，提高人群区域的特征权重，凸显感兴趣区域，同时降低弱相关背景区域的特征权重，抑制背景噪声干扰，最终提升人群密度图质量。为了验证算法的有效性，在多个经典人群计数数据集上进行了实验，实验结果表明，与现有的人群计数算法相比，CSANet具有良好的多尺度特征提取能力和背景噪声抑制能力，这使得密集场景下计数算法的准确性和鲁棒性均有较大提升。

关键词: 人群计数, 编解码网络, 注意力, 特征融合, 深度学习

Abstract:

The purpose of crowd counting is to accurately predict the number, distribution and density of crowds in real scenes. However, crowd counting often suffers from some problems such as complex background, diverse target scales, and cluttered crowd distribution, which strongly affects the precision of counting. To solve these problems, a channel and spatial attention-based encoder-decoder network for crowd counting (CSANet) is proposed. It uses a multi-level encoder-decoder network to extract multi-scale semantic features, and fully integrates spatial context information to solve the problem of pedestrian scale changes and messy distribution in complex scenes. To reduce the impact of complex background on counting performance, channel and spatial attention are introduced in the process of feature fusion to improve the quality of crowd density map by increasing the feature weights of crowd regions to highlight regions of interest, and decreasing the feature weights of weakly correlated background regions to suppress background noise interference. To verify the effectiveness of the proposed algorithm, experiments are conducted on several classical crowd counting datasets, and the experimental results show that CSANet performs well in multi-scale feature extraction and background noise suppression compared with existing crowd counting algorithms, which greatly improves the accuracy and robustness of counting algorithm in dense scenes.

Key words: crowd counting, encoder-decoder network, attention, feature fusion, deep learning

中图分类号:

TP391

余鹰, 潘诚, 朱慧琳, 钱进, 汤洪. 融合通道与空间注意力的编解码人群计数算法[J]. 计算机科学与探索, 2022, 16(11): 2547-2556.

YU Ying, PAN Cheng, ZHU Huilin, QIAN Jin, TANG Hong. Encoder-Decoder Network Fusing Channel and Spatial Attention for Crowd Counting[J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(11): 2547-2556.

图/表 13

参考文献 33

[1]	SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition[EB/OL]. (2016-08-22)[2021-01-06]. https://arxiv.org/abs/1608.06197.
[2]	HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition[C]// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, Jun 27-30, 2016. Washington: IEEE Computer Society, 2016: 770-778.
[3]	HAN K, WANG Y H, TIAN Q, et al. GhostNet: more features from cheap operations[C]// Proceedings of the 2020 IEEE Conference Computer Vision and Pattern Recognition, Seattle, Jun 13-19, 2020. Piscataway: IEEE, 2020: 1580-1589.
[4]	ZHANG C, LI H S, WANG X G, et al. Cross-scene crowd counting via deep convolutional neural networks[C]// Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition, Boston, Jun 7-12, 2015. Washington: IEEE Computer Society, 2015: 833-841.
[5]	ZHANG Y Y, ZHOU D, CHEN S Q, et al. Single-image crowd counting via multi-column convolutional neural network[C]// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, Jun 27-30, 2016. Washington: IEEE Computer Society, 2016: 589-597.
[6]	LIU N, LONG Y C, ZOU C Q, et al. ADCrowdNet: an attention-injective deformable convolutional network for crowd understanding[C]// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, Jun 16-20, 2019. Piscataway: IEEE, 2019: 3225- 3234.
[7]	VIOLA P, JONES M J. Robust real-time face detection[J]. International Journal of Computer Vision, 2004, 57(2): 137-154. DOI URL
[8]	DALAL N, TRIGGS B. Histograms of oriented gradients for human detection[C]// Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Diego, Jun 20-26, 2005. Washington: IEEE Computer Society, 2005: 886-893.
[9]	DOLLAR P, WOJEK C, SCHIELE B, et al. Pedestrian detection: an evaluation of the state of the art[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2011, 34(4): 743-761. DOI URL
[10]	FELZENSZWALB P F, GIRSHICK R B, MCALLESTER D, et al. Object detection with discriminatively trained part-based models[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2009, 32(9): 1627-1645. DOI URL
[11]	CHEN K, GONG S G, XIANG T, et al. Cumulative attribute space for age and crowd density estimation[C]// Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition, Portland, Jun 23-28, 2013. Washington: IEEE Computer Society, 2013: 2467-2474.
[12]	HOWARD A, SANDLER M, CHU G, et al. Searching for MobileNetV3[C]// Proceedings of the 2019 IEEE/CVF Inter-national Conference on Computer Vision, Seoul, Oct 27-Nov 2, 2019. Piscataway: IEEE, 2019: 1314-1324.
[13]	REN S Q, HE K M, GIRSHICK R, et al. Faster R-CNN: towards real-time object detection with region proposal networks[EB/OL]. (2016-01-06) [2021-01-06]. https://arxiv.org/abs/1506.01497.
[14]	BOCHKOVSKIY A, WANG C Y, LIAO H Y M. YOLOv4: optimal speed and accuracy of object detection[EB/OL]. (2020-04-23)[2021-01-06]. https://arxiv.org/abs/2004.10934.
[15]	CHEN L C, ZHU Y, PAPANDREOU G, et al. Encoder-decoder with atrous separable convolution for semantic image segmentation[C]// LNCS 11211: Proceedings of the 15th European Conference on Computer Vision, Munich, Sep 8-14, 2018. Cham: Springer, 2018: 833-851.
[16]	余鹰, 朱慧琳, 钱进, 等. 基于深度学习的人群计数研究综述[J]. 计算机研究与发展, 2021, 58(12): 2724-2747.
	YU Y, ZHU H L, QIAN J, el al. Survey on deep learning based crowd counting[J]. Journal of Computer Research and Development, 2021, 58(12): 2724-2747.
[17]	GAO G S, GAO J Y, LIU Q J, et al. CNN-based density estimation and crowd counting: a survey[EB/OL]. (2020-03-28)[2021-01-06]. https://arxiv.org/abs/2003.12783.
[18]	YU Y, ZHU H L, WANG L W, et al. Dense crowd counting based on adaptive scene division[J]. International Journal of Machine Learning and Cybernetics, 2021, 12(4): 931-942. DOI URL
[19]	SINDAGI V A, PATEL V M. Generating high-quality crowd density maps using contextual pyramid CNNs[C]// Proceedings of the 2017 IEEE International Conference on Computer Vision, Venice, Oct 22-29, 2017. Washington: IEEE Computer Society, 2017: 1861-1870.
[20]	SAM D B, SURYA S, BABU R V. Switching convolutional neural network for crowd counting[C]// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, Jul 21-26, 2017. Washington: IEEE Computer Society, 2017: 4031-4039.
[21]	CAO X K, WANG Z P, ZHAO Y Y, et al. Scale aggregation network for accurate and efficient crowd counting[C]// LNCS 11209: Proceedings of the 15th European Conference on Computer Vision, Munich, Sep 8-14, 2018. Cham: Springer, 2018: 734-750.
[22]	LI Y H, ZHANG X F, CHEN D M. CSRNet: dilated convo-lutional neural networks for understanding the highly congested scenes[C]// Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, Jun 18-22, 2018. Washington: IEEE Computer Society, 2018: 1091-1100.
[23]	WOO S, PARK J, LEE J Y, et al. CBAM: convolutional block attention module[C]// LNCS 11211: Proceedings of the 15th European Conference on Computer Vision, Munich, Sep 8-14, 2018. Cham: Springer, 2018: 3-19.
[24]	DENG J, DONG W, SOCHER R, et al. ImageNet: a large-scale hierarchical image database[C]// Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, Jun 20-25, 2009. Washington: IEEE Computer Society, 2009: 248-255.
[25]	WU X J, ZHENG Y B, YE H, et al. Adaptive scenario discovery for crowd counting[C]// Proceedings of the 2019 IEEE International Conference on Acoustics, Speech and Signal Processing, Brighton, May 12-17, 2019. Piscataway: IEEE, 2019: 2382-2386.
[26]	SHI M J, YANG Z H, XU C, et al. Revisiting perspective information for efficient crowd counting[C]// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, Jun 16-20, 2019. Piscataway: IEEE, 2019: 7279-7288.
[27]	ZHU L, ZHAO Z J, LU C, et al. Dual path multi-scale fusion networks with attention for crowd counting[EB/OL]. (2019-02-04)[2021-01-06]. https://arxiv.org/abs/1902.01115.
[28]	OH M, OLSEN P, RAMAMURTHY K N. Crowd counting with decomposed uncertainty[C]// Proceedings of the 2020 AAAI Conference on Artificial Intelligepnce, New York, Feb 7-12, 2020. Menlo Park: AAAI Press, 2020: 11799-11806.
[29]	IDREES H, TAYYAB M, ATHREY K, et al. Composition loss for counting, density map estimation and localization in dense crowds[C]// LNCS 11206: Proceedings of the 15th European Conference on Computer Vision, Munich, Sep 8-14, 2018. Cham: Springer, 2018: 544-559.
[30]	SINDAGI V A, PATEL V M. CNN-based cascaded multi-task learning of high-level prior and density estimation for crowd counting[C]// Proceedings of the 14th IEEE International Conference on Advanced Video and Signal Based Surve-illance, Lecce, Aug 29-Sep 1, 2017. Washington: IEEE Computer Society, 2017: 1-6.
[31]	SINDAGI V A, PATEL V M. Inverse attention guided deep crowd counting network[C]// Proceedings of the 16th IEEE International Conference on Advanced Video and Signal Based Surveillance, Taipei, China, Sep 18-21, 2019. Piscataway: IEEE, 2019: 1-8.
[32]	ZHANG A R, SHEN J Y, XIAO Z H, et al. Relational attention network for crowd counting[C]// Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, Seoul, Oct 27 -Nov 2, 2019. Piscataway: IEEE, 2019: 6788-6797.
[33]	IDREES H, SALEEMI I, SEIBERT C, et al. Multi-source multi-scale counting in extremely dense crowd images[C]// Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition, Portland, Jun 23-28, 2013. Washington: IEEE Computer Society, 2013: 2547-2554.

编辑推荐 0

Metrics

阅读次数

全文

251

HTML			PDF

最新录用	在线预览	正式出版	最新录用	在线预览	正式出版
0	0	10	67	0	174

来源	本网站	其他网站

次数	244	7
比例	97%	3%

摘要

318

最新录用	在线预览	正式出版

64	0	254

	来源	本网站

	次数	318
	比例	100%

编码器	解码器
Conv1_1(3-64-1)	Upsampling
Conv1_2(3-64-1)	Concat
Max pooling	CBAM module
Conv2_1(3-128-1)	Conv6_1(1-256-1)
Conv2_2(3-128-1)	Conv6_2(3-256-1)
Max pooling	Upsampling
Conv3_1(3-256-1)	Concat
Conv3_2(3-256-1)	CBAM module
Conv3_3(3-256-1)	Conv7_1(1-128-1)
Max pooling	Conv7_2(3-128-1)
Conv4_1(3-512-1)	Upsampling
Conv4_2(3-512-1)	Concat
Conv4_3(3-512-1)	CBAM module
Max pooling	Conv8_1(1-64-1)
Conv5_1(3-512-1)	Conv8_2(3-64-1)
Conv5_2(3-512-1)	Upsampling
Conv5_3(3-512-1)	Conv9_1(3-32-1)
	Conv9_2(3-32-1)
	Conv10_1(1-1-1)

编码器	解码器
Conv1_1(3-64-1)	Upsampling
Conv1_2(3-64-1)	Concat
Max pooling	CBAM module
Conv2_1(3-128-1)	Conv6_1(1-256-1)
Conv2_2(3-128-1)	Conv6_2(3-256-1)
Max pooling	Upsampling
Conv3_1(3-256-1)	Concat
Conv3_2(3-256-1)	CBAM module
Conv3_3(3-256-1)	Conv7_1(1-128-1)
Max pooling	Conv7_2(3-128-1)
Conv4_1(3-512-1)	Upsampling
Conv4_2(3-512-1)	Concat
Conv4_3(3-512-1)	CBAM module
Max pooling	Conv8_1(1-64-1)
Conv5_1(3-512-1)	Conv8_2(3-64-1)
Conv5_2(3-512-1)	Upsampling
Conv5_3(3-512-1)	Conv9_1(3-32-1)
	Conv9_2(3-32-1)
	Conv10_1(1-1-1)

方法	Part_A		Part_B
方法	MAE	RMSE	MAE	RMSE
MCNN^[5]	110.2	173.2	26.4	41.3
CP-CNN^[19]	73.6	106.4	20.1	30.1
CSRNet^[22]	68.2	115.0	10.6	16.0
ASD^[25]	65.6	98.0	8.5	13.7
PACNN^[26]	66.3	106.4	8.9	13.5
SFANet^[27]	59.8	99.3	6.9	10.9
DUBNet^[28]	64.6	106.8	7.8	12.2
CSANet	59.4	97.1	7.1	11.2

方法	Part_A		Part_B
方法	MAE	RMSE	MAE	RMSE
MCNN^[5]	110.2	173.2	26.4	41.3
CP-CNN^[19]	73.6	106.4	20.1	30.1
CSRNet^[22]	68.2	115.0	10.6	16.0
ASD^[25]	65.6	98.0	8.5	13.7
PACNN^[26]	66.3	106.4	8.9	13.5
SFANet^[27]	59.8	99.3	6.9	10.9
DUBNet^[28]	64.6	106.8	7.8	12.2
CSANet	59.4	97.1	7.1	11.2

方法	MAE	RMSE
MCNN^[5]	277.0	426.0
CMTL^[30]	252.0	514.0
IA-DCCN^[31]	125.0	186.0
RANet^[32]	111.0	190.0
DUBNet^[28]	105.6	180.5
CSANet	104.8	175.4

融合通道与空间注意力的编解码人群计数算法

Encoder-Decoder Network Fusing Channel and Spatial Attention for Crowd Counting

RichHTML

PDF

可视化

摘要/Abstract

引用本文

使用本文

图/表 13

参考文献 33

相关文章 15

编辑推荐 0

Metrics

方法	MAE	RMSE
MCNN^[5]	377.6	509.1
Switch-CNN^[20]	318.1	439.2
CP-CNN^[19]	295.8	320.9
CSRNet^[22]	266.1	397.5
ASD^[25]	196.2	270.9
PACNN^[26]	267.9	357.8
CSANet	191.3	262.8

模块	Part_A		Part_B
模块	MAE	RMSE	MAE	RMSE
主干网络	60.9	99.2	7.7	12.2
主干网络+注意力	58.5	96.1	7.1	11.2

[1]	吕晓琦, 纪科, 陈贞翔, 孙润元, 马坤, 邬俊, 李浥东. 结合注意力与循环神经网络的专家推荐算法[J]. 计算机科学与探索, 2022, 16(9): 2068-2077.
[2]	李珍琦, 王晶, 贾子钰, 林友芳. 融合注意力的多维特征图卷积运动想象分类[J]. 计算机科学与探索, 2022, 16(9): 2050-2060.
[3]	张祥平, 刘建勋. 基于深度学习的代码表征及其应用综述[J]. 计算机科学与探索, 2022, 16(9): 2011-2029.
[4]	李冬梅, 罗斯斯, 张小平, 许福. 命名实体识别方法研究综述[J]. 计算机科学与探索, 2022, 16(9): 1954-1968.
[5]	任宁, 付岩, 吴艳霞, 梁鹏举, 韩希. 深度学习应用于目标检测中失衡问题研究综述[J]. 计算机科学与探索, 2022, 16(9): 1933-1953.
[6]	杨才东, 李承阳, 李忠博, 谢永强, 孙方伟, 齐锦. 深度学习的图像超分辨率重建技术综述[J]. 计算机科学与探索, 2022, 16(9): 1990-2010.
[7]	杨知桥, 张莹, 王新杰, 张东波, 王玉. 改进U型网络在视网膜病变检测中的应用研究[J]. 计算机科学与探索, 2022, 16(8): 1877-1884.
[8]	安凤平, 李晓薇, 曹翔. 权重初始化-滑动窗口CNN的医学图像分类[J]. 计算机科学与探索, 2022, 16(8): 1885-1897.
[9]	曾凡智, 许露倩, 周燕, 周月霞, 廖俊玮. 面向智慧教育的知识追踪模型研究综述[J]. 计算机科学与探索, 2022, 16(8): 1742-1763.
[10]	刘艺, 李蒙蒙, 郑奇斌, 秦伟, 任小广. 视频目标跟踪算法综述[J]. 计算机科学与探索, 2022, 16(7): 1504-1515.
[11]	赵小明, 杨轶娇, 张石清. 面向深度学习的多模态情感识别研究进展[J]. 计算机科学与探索, 2022, 16(7): 1479-1503.
[12]	彭豪, 李晓明. 多尺度选择金字塔网络的小样本目标检测算法[J]. 计算机科学与探索, 2022, 16(7): 1649-1660.
[13]	夏鸿斌, 肖奕飞, 刘渊. 融合自注意力机制的长文本生成对抗网络模型[J]. 计算机科学与探索, 2022, 16(7): 1603-1610.
[14]	李运寰, 闻继伟, 彭力. 高帧率的轻量级孪生网络目标跟踪[J]. 计算机科学与探索, 2022, 16(6): 1405-1416.
[15]	赵运基, 范存良, 张新良. 融合多特征和通道感知的目标跟踪算法[J]. 计算机科学与探索, 2022, 16(6): 1417-1428.