计算机科学与探索 ›› 2022, Vol. 16 ›› Issue (11): 2547-2556.DOI: 10.3778/j.issn.1673-9418.2104122
收稿日期:
2021-05-07
修回日期:
2021-06-23
出版日期:
2022-11-01
发布日期:
2021-06-24
通讯作者:
+ E-mail: yuyingjx@163.com作者简介:
余鹰(1979—),女,博士,副教授,硕士生导师,CCF会员,主要研究方向为机器学习、计算机视觉、粒计算等。基金资助:
YU Ying+(), PAN Cheng, ZHU Huilin, QIAN Jin, TANG Hong
Received:
2021-05-07
Revised:
2021-06-23
Online:
2022-11-01
Published:
2021-06-24
About author:
YU Ying, born in 1979, Ph.D., associate professor, M.S. supervisor, member of CCF. Her research interests include machine learning, computer vision, granular computing, etc.Supported by:
摘要:
人群计数旨在准确地预测现实场景中人群的数量、分布和密度,然而现实场景普遍存在背景复杂、目标尺度多样和人群分布杂乱等问题,给人群计数任务带来极大的挑战。针对这些问题,提出了一种融合通道与空间注意力的编解码结构人群计数网络(CSANet)。该模型采用多层次编解码网络结构提取多尺度语义特征,并充分融合空间上下文信息,以此来解决复杂场景中行人尺度变化和分布杂乱的问题;为了降低复杂背景对计数性能的影响,在特征融合的过程中引入了通道与空间注意力,提高人群区域的特征权重,凸显感兴趣区域,同时降低弱相关背景区域的特征权重,抑制背景噪声干扰,最终提升人群密度图质量。为了验证算法的有效性,在多个经典人群计数数据集上进行了实验,实验结果表明,与现有的人群计数算法相比,CSANet具有良好的多尺度特征提取能力和背景噪声抑制能力,这使得密集场景下计数算法的准确性和鲁棒性均有较大提升。
中图分类号:
余鹰, 潘诚, 朱慧琳, 钱进, 汤洪. 融合通道与空间注意力的编解码人群计数算法[J]. 计算机科学与探索, 2022, 16(11): 2547-2556.
YU Ying, PAN Cheng, ZHU Huilin, QIAN Jin, TANG Hong. Encoder-Decoder Network Fusing Channel and Spatial Attention for Crowd Counting[J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(11): 2547-2556.
编码器 | 解码器 |
---|---|
Conv1_1(3-64-1) | Upsampling |
Conv1_2(3-64-1) | Concat |
Max pooling | CBAM module |
Conv2_1(3-128-1) | Conv6_1(1-256-1) |
Conv2_2(3-128-1) | Conv6_2(3-256-1) |
Max pooling | Upsampling |
Conv3_1(3-256-1) | Concat |
Conv3_2(3-256-1) | CBAM module |
Conv3_3(3-256-1) | Conv7_1(1-128-1) |
Max pooling | Conv7_2(3-128-1) |
Conv4_1(3-512-1) | Upsampling |
Conv4_2(3-512-1) | Concat |
Conv4_3(3-512-1) | CBAM module |
Max pooling | Conv8_1(1-64-1) |
Conv5_1(3-512-1) | Conv8_2(3-64-1) |
Conv5_2(3-512-1) | Upsampling |
Conv5_3(3-512-1) | Conv9_1(3-32-1) |
Conv9_2(3-32-1) | |
Conv10_1(1-1-1) |
表1 网络参数
Table 1 Network parameters
编码器 | 解码器 |
---|---|
Conv1_1(3-64-1) | Upsampling |
Conv1_2(3-64-1) | Concat |
Max pooling | CBAM module |
Conv2_1(3-128-1) | Conv6_1(1-256-1) |
Conv2_2(3-128-1) | Conv6_2(3-256-1) |
Max pooling | Upsampling |
Conv3_1(3-256-1) | Concat |
Conv3_2(3-256-1) | CBAM module |
Conv3_3(3-256-1) | Conv7_1(1-128-1) |
Max pooling | Conv7_2(3-128-1) |
Conv4_1(3-512-1) | Upsampling |
Conv4_2(3-512-1) | Concat |
Conv4_3(3-512-1) | CBAM module |
Max pooling | Conv8_1(1-64-1) |
Conv5_1(3-512-1) | Conv8_2(3-64-1) |
Conv5_2(3-512-1) | Upsampling |
Conv5_3(3-512-1) | Conv9_1(3-32-1) |
Conv9_2(3-32-1) | |
Conv10_1(1-1-1) |
方法 | Part_A | Part_B | ||
---|---|---|---|---|
MAE | RMSE | MAE | RMSE | |
MCNN[ | 110.2 | 173.2 | 26.4 | 41.3 |
CP-CNN[ | 73.6 | 106.4 | 20.1 | 30.1 |
CSRNet[ | 68.2 | 115.0 | 10.6 | 16.0 |
ASD[ | 65.6 | 98.0 | 8.5 | 13.7 |
PACNN[ | 66.3 | 106.4 | 8.9 | 13.5 |
SFANet[ | 59.8 | 99.3 | 6.9 | 10.9 |
DUBNet[ | 64.6 | 106.8 | 7.8 | 12.2 |
CSANet | 59.4 | 97.1 | 7.1 | 11.2 |
表2 不同计数方法在ShanghaiTech数据集上的性能比较
Table 2 Performance comparison of different methods on ShanghaiTech dataset
方法 | Part_A | Part_B | ||
---|---|---|---|---|
MAE | RMSE | MAE | RMSE | |
MCNN[ | 110.2 | 173.2 | 26.4 | 41.3 |
CP-CNN[ | 73.6 | 106.4 | 20.1 | 30.1 |
CSRNet[ | 68.2 | 115.0 | 10.6 | 16.0 |
ASD[ | 65.6 | 98.0 | 8.5 | 13.7 |
PACNN[ | 66.3 | 106.4 | 8.9 | 13.5 |
SFANet[ | 59.8 | 99.3 | 6.9 | 10.9 |
DUBNet[ | 64.6 | 106.8 | 7.8 | 12.2 |
CSANet | 59.4 | 97.1 | 7.1 | 11.2 |
方法 | MAE | RMSE |
---|---|---|
MCNN[ | 277.0 | 426.0 |
CMTL[ | 252.0 | 514.0 |
IA-DCCN[ | 125.0 | 186.0 |
RANet[ | 111.0 | 190.0 |
DUBNet[ | 105.6 | 180.5 |
CSANet | 104.8 | 175.4 |
表3 不同计数方法在UCF_QNRF数据集上的性能比较
Table 3 Performance comparison of different methods on UCF_QNRF dataset
方法 | MAE | RMSE |
---|---|---|
MCNN[ | 277.0 | 426.0 |
CMTL[ | 252.0 | 514.0 |
IA-DCCN[ | 125.0 | 186.0 |
RANet[ | 111.0 | 190.0 |
DUBNet[ | 105.6 | 180.5 |
CSANet | 104.8 | 175.4 |
方法 | MAE | RMSE |
---|---|---|
MCNN[ | 377.6 | 509.1 |
Switch-CNN[ | 318.1 | 439.2 |
CP-CNN[ | 295.8 | 320.9 |
CSRNet[ | 266.1 | 397.5 |
ASD[ | 196.2 | 270.9 |
PACNN[ | 267.9 | 357.8 |
CSANet | 191.3 | 262.8 |
表4 不同计数方法在UCF_CC_50数据集上的性能比较
Table 4 Performance comparison of different methodson UCF_CC_50 dataset
方法 | MAE | RMSE |
---|---|---|
MCNN[ | 377.6 | 509.1 |
Switch-CNN[ | 318.1 | 439.2 |
CP-CNN[ | 295.8 | 320.9 |
CSRNet[ | 266.1 | 397.5 |
ASD[ | 196.2 | 270.9 |
PACNN[ | 267.9 | 357.8 |
CSANet | 191.3 | 262.8 |
模块 | Part_A | Part_B | ||
---|---|---|---|---|
MAE | RMSE | MAE | RMSE | |
主干网络 | 60.9 | 99.2 | 7.7 | 12.2 |
主干网络+注意力 | 58.5 | 96.1 | 7.1 | 11.2 |
表5 ShanghaiTech数据集消融实验
Table 5 Ablation study on ShanghaiTech dataset
模块 | Part_A | Part_B | ||
---|---|---|---|---|
MAE | RMSE | MAE | RMSE | |
主干网络 | 60.9 | 99.2 | 7.7 | 12.2 |
主干网络+注意力 | 58.5 | 96.1 | 7.1 | 11.2 |
[1] | SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition[EB/OL]. (2016-08-22)[2021-01-06]. https://arxiv.org/abs/1608.06197. |
[2] | HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition[C]// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, Jun 27-30, 2016. Washington: IEEE Computer Society, 2016: 770-778. |
[3] | HAN K, WANG Y H, TIAN Q, et al. GhostNet: more features from cheap operations[C]// Proceedings of the 2020 IEEE Conference Computer Vision and Pattern Recognition, Seattle, Jun 13-19, 2020. Piscataway: IEEE, 2020: 1580-1589. |
[4] | ZHANG C, LI H S, WANG X G, et al. Cross-scene crowd counting via deep convolutional neural networks[C]// Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition, Boston, Jun 7-12, 2015. Washington: IEEE Computer Society, 2015: 833-841. |
[5] | ZHANG Y Y, ZHOU D, CHEN S Q, et al. Single-image crowd counting via multi-column convolutional neural network[C]// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, Jun 27-30, 2016. Washington: IEEE Computer Society, 2016: 589-597. |
[6] | LIU N, LONG Y C, ZOU C Q, et al. ADCrowdNet: an attention-injective deformable convolutional network for crowd understanding[C]// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, Jun 16-20, 2019. Piscataway: IEEE, 2019: 3225- 3234. |
[7] |
VIOLA P, JONES M J. Robust real-time face detection[J]. International Journal of Computer Vision, 2004, 57(2): 137-154.
DOI URL |
[8] | DALAL N, TRIGGS B. Histograms of oriented gradients for human detection[C]// Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Diego, Jun 20-26, 2005. Washington: IEEE Computer Society, 2005: 886-893. |
[9] |
DOLLAR P, WOJEK C, SCHIELE B, et al. Pedestrian detection: an evaluation of the state of the art[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2011, 34(4): 743-761.
DOI URL |
[10] |
FELZENSZWALB P F, GIRSHICK R B, MCALLESTER D, et al. Object detection with discriminatively trained part-based models[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2009, 32(9): 1627-1645.
DOI URL |
[11] | CHEN K, GONG S G, XIANG T, et al. Cumulative attribute space for age and crowd density estimation[C]// Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition, Portland, Jun 23-28, 2013. Washington: IEEE Computer Society, 2013: 2467-2474. |
[12] | HOWARD A, SANDLER M, CHU G, et al. Searching for MobileNetV3[C]// Proceedings of the 2019 IEEE/CVF Inter-national Conference on Computer Vision, Seoul, Oct 27-Nov 2, 2019. Piscataway: IEEE, 2019: 1314-1324. |
[13] | REN S Q, HE K M, GIRSHICK R, et al. Faster R-CNN: towards real-time object detection with region proposal networks[EB/OL]. (2016-01-06) [2021-01-06]. https://arxiv.org/abs/1506.01497. |
[14] | BOCHKOVSKIY A, WANG C Y, LIAO H Y M. YOLOv4: optimal speed and accuracy of object detection[EB/OL]. (2020-04-23)[2021-01-06]. https://arxiv.org/abs/2004.10934. |
[15] | CHEN L C, ZHU Y, PAPANDREOU G, et al. Encoder-decoder with atrous separable convolution for semantic image segmentation[C]// LNCS 11211: Proceedings of the 15th European Conference on Computer Vision, Munich, Sep 8-14, 2018. Cham: Springer, 2018: 833-851. |
[16] | 余鹰, 朱慧琳, 钱进, 等. 基于深度学习的人群计数研究综述[J]. 计算机研究与发展, 2021, 58(12): 2724-2747. |
YU Y, ZHU H L, QIAN J, el al. Survey on deep learning based crowd counting[J]. Journal of Computer Research and Development, 2021, 58(12): 2724-2747. | |
[17] | GAO G S, GAO J Y, LIU Q J, et al. CNN-based density estimation and crowd counting: a survey[EB/OL]. (2020-03-28)[2021-01-06]. https://arxiv.org/abs/2003.12783. |
[18] |
YU Y, ZHU H L, WANG L W, et al. Dense crowd counting based on adaptive scene division[J]. International Journal of Machine Learning and Cybernetics, 2021, 12(4): 931-942.
DOI URL |
[19] | SINDAGI V A, PATEL V M. Generating high-quality crowd density maps using contextual pyramid CNNs[C]// Proceedings of the 2017 IEEE International Conference on Computer Vision, Venice, Oct 22-29, 2017. Washington: IEEE Computer Society, 2017: 1861-1870. |
[20] | SAM D B, SURYA S, BABU R V. Switching convolutional neural network for crowd counting[C]// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, Jul 21-26, 2017. Washington: IEEE Computer Society, 2017: 4031-4039. |
[21] | CAO X K, WANG Z P, ZHAO Y Y, et al. Scale aggregation network for accurate and efficient crowd counting[C]// LNCS 11209: Proceedings of the 15th European Conference on Computer Vision, Munich, Sep 8-14, 2018. Cham: Springer, 2018: 734-750. |
[22] | LI Y H, ZHANG X F, CHEN D M. CSRNet: dilated convo-lutional neural networks for understanding the highly congested scenes[C]// Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, Jun 18-22, 2018. Washington: IEEE Computer Society, 2018: 1091-1100. |
[23] | WOO S, PARK J, LEE J Y, et al. CBAM: convolutional block attention module[C]// LNCS 11211: Proceedings of the 15th European Conference on Computer Vision, Munich, Sep 8-14, 2018. Cham: Springer, 2018: 3-19. |
[24] | DENG J, DONG W, SOCHER R, et al. ImageNet: a large-scale hierarchical image database[C]// Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, Jun 20-25, 2009. Washington: IEEE Computer Society, 2009: 248-255. |
[25] | WU X J, ZHENG Y B, YE H, et al. Adaptive scenario discovery for crowd counting[C]// Proceedings of the 2019 IEEE International Conference on Acoustics, Speech and Signal Processing, Brighton, May 12-17, 2019. Piscataway: IEEE, 2019: 2382-2386. |
[26] | SHI M J, YANG Z H, XU C, et al. Revisiting perspective information for efficient crowd counting[C]// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, Jun 16-20, 2019. Piscataway: IEEE, 2019: 7279-7288. |
[27] | ZHU L, ZHAO Z J, LU C, et al. Dual path multi-scale fusion networks with attention for crowd counting[EB/OL]. (2019-02-04)[2021-01-06]. https://arxiv.org/abs/1902.01115. |
[28] | OH M, OLSEN P, RAMAMURTHY K N. Crowd counting with decomposed uncertainty[C]// Proceedings of the 2020 AAAI Conference on Artificial Intelligepnce, New York, Feb 7-12, 2020. Menlo Park: AAAI Press, 2020: 11799-11806. |
[29] | IDREES H, TAYYAB M, ATHREY K, et al. Composition loss for counting, density map estimation and localization in dense crowds[C]// LNCS 11206: Proceedings of the 15th European Conference on Computer Vision, Munich, Sep 8-14, 2018. Cham: Springer, 2018: 544-559. |
[30] | SINDAGI V A, PATEL V M. CNN-based cascaded multi-task learning of high-level prior and density estimation for crowd counting[C]// Proceedings of the 14th IEEE International Conference on Advanced Video and Signal Based Surve-illance, Lecce, Aug 29-Sep 1, 2017. Washington: IEEE Computer Society, 2017: 1-6. |
[31] | SINDAGI V A, PATEL V M. Inverse attention guided deep crowd counting network[C]// Proceedings of the 16th IEEE International Conference on Advanced Video and Signal Based Surveillance, Taipei, China, Sep 18-21, 2019. Piscataway: IEEE, 2019: 1-8. |
[32] | ZHANG A R, SHEN J Y, XIAO Z H, et al. Relational attention network for crowd counting[C]// Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, Seoul, Oct 27 -Nov 2, 2019. Piscataway: IEEE, 2019: 6788-6797. |
[33] | IDREES H, SALEEMI I, SEIBERT C, et al. Multi-source multi-scale counting in extremely dense crowd images[C]// Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition, Portland, Jun 23-28, 2013. Washington: IEEE Computer Society, 2013: 2547-2554. |
[1] | 吕晓琦, 纪科, 陈贞翔, 孙润元, 马坤, 邬俊, 李浥东. 结合注意力与循环神经网络的专家推荐算法[J]. 计算机科学与探索, 2022, 16(9): 2068-2077. |
[2] | 李珍琦, 王晶, 贾子钰, 林友芳. 融合注意力的多维特征图卷积运动想象分类[J]. 计算机科学与探索, 2022, 16(9): 2050-2060. |
[3] | 张祥平, 刘建勋. 基于深度学习的代码表征及其应用综述[J]. 计算机科学与探索, 2022, 16(9): 2011-2029. |
[4] | 李冬梅, 罗斯斯, 张小平, 许福. 命名实体识别方法研究综述[J]. 计算机科学与探索, 2022, 16(9): 1954-1968. |
[5] | 任宁, 付岩, 吴艳霞, 梁鹏举, 韩希. 深度学习应用于目标检测中失衡问题研究综述[J]. 计算机科学与探索, 2022, 16(9): 1933-1953. |
[6] | 杨才东, 李承阳, 李忠博, 谢永强, 孙方伟, 齐锦. 深度学习的图像超分辨率重建技术综述[J]. 计算机科学与探索, 2022, 16(9): 1990-2010. |
[7] | 杨知桥, 张莹, 王新杰, 张东波, 王玉. 改进U型网络在视网膜病变检测中的应用研究[J]. 计算机科学与探索, 2022, 16(8): 1877-1884. |
[8] | 安凤平, 李晓薇, 曹翔. 权重初始化-滑动窗口CNN的医学图像分类[J]. 计算机科学与探索, 2022, 16(8): 1885-1897. |
[9] | 曾凡智, 许露倩, 周燕, 周月霞, 廖俊玮. 面向智慧教育的知识追踪模型研究综述[J]. 计算机科学与探索, 2022, 16(8): 1742-1763. |
[10] | 刘艺, 李蒙蒙, 郑奇斌, 秦伟, 任小广. 视频目标跟踪算法综述[J]. 计算机科学与探索, 2022, 16(7): 1504-1515. |
[11] | 赵小明, 杨轶娇, 张石清. 面向深度学习的多模态情感识别研究进展[J]. 计算机科学与探索, 2022, 16(7): 1479-1503. |
[12] | 彭豪, 李晓明. 多尺度选择金字塔网络的小样本目标检测算法[J]. 计算机科学与探索, 2022, 16(7): 1649-1660. |
[13] | 夏鸿斌, 肖奕飞, 刘渊. 融合自注意力机制的长文本生成对抗网络模型[J]. 计算机科学与探索, 2022, 16(7): 1603-1610. |
[14] | 李运寰, 闻继伟, 彭力. 高帧率的轻量级孪生网络目标跟踪[J]. 计算机科学与探索, 2022, 16(6): 1405-1416. |
[15] | 赵运基, 范存良, 张新良. 融合多特征和通道感知的目标跟踪算法[J]. 计算机科学与探索, 2022, 16(6): 1417-1428. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||