Journal of Frontiers of Computer Science and Technology ›› 2022, Vol. 16 ›› Issue (11): 2547-2556.DOI: 10.3778/j.issn.1673-9418.2104122
• Artificial Intelligence • Previous Articles Next Articles
YU Ying+(), PAN Cheng, ZHU Huilin, QIAN Jin, TANG Hong
Received:
2021-05-07
Revised:
2021-06-23
Online:
2022-11-01
Published:
2021-06-24
About author:
YU Ying, born in 1979, Ph.D., associate professor, M.S. supervisor, member of CCF. Her research interests include machine learning, computer vision, granular computing, etc.Supported by:
通讯作者:
+ E-mail: yuyingjx@163.com作者简介:
余鹰(1979—),女,博士,副教授,硕士生导师,CCF会员,主要研究方向为机器学习、计算机视觉、粒计算等。基金资助:
CLC Number:
YU Ying, PAN Cheng, ZHU Huilin, QIAN Jin, TANG Hong. Encoder-Decoder Network Fusing Channel and Spatial Attention for Crowd Counting[J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(11): 2547-2556.
余鹰, 潘诚, 朱慧琳, 钱进, 汤洪. 融合通道与空间注意力的编解码人群计数算法[J]. 计算机科学与探索, 2022, 16(11): 2547-2556.
Add to citation manager EndNote|Ris|BibTeX
URL: http://fcst.ceaj.org/EN/10.3778/j.issn.1673-9418.2104122
编码器 | 解码器 |
---|---|
Conv1_1(3-64-1) | Upsampling |
Conv1_2(3-64-1) | Concat |
Max pooling | CBAM module |
Conv2_1(3-128-1) | Conv6_1(1-256-1) |
Conv2_2(3-128-1) | Conv6_2(3-256-1) |
Max pooling | Upsampling |
Conv3_1(3-256-1) | Concat |
Conv3_2(3-256-1) | CBAM module |
Conv3_3(3-256-1) | Conv7_1(1-128-1) |
Max pooling | Conv7_2(3-128-1) |
Conv4_1(3-512-1) | Upsampling |
Conv4_2(3-512-1) | Concat |
Conv4_3(3-512-1) | CBAM module |
Max pooling | Conv8_1(1-64-1) |
Conv5_1(3-512-1) | Conv8_2(3-64-1) |
Conv5_2(3-512-1) | Upsampling |
Conv5_3(3-512-1) | Conv9_1(3-32-1) |
Conv9_2(3-32-1) | |
Conv10_1(1-1-1) |
Table 1 Network parameters
编码器 | 解码器 |
---|---|
Conv1_1(3-64-1) | Upsampling |
Conv1_2(3-64-1) | Concat |
Max pooling | CBAM module |
Conv2_1(3-128-1) | Conv6_1(1-256-1) |
Conv2_2(3-128-1) | Conv6_2(3-256-1) |
Max pooling | Upsampling |
Conv3_1(3-256-1) | Concat |
Conv3_2(3-256-1) | CBAM module |
Conv3_3(3-256-1) | Conv7_1(1-128-1) |
Max pooling | Conv7_2(3-128-1) |
Conv4_1(3-512-1) | Upsampling |
Conv4_2(3-512-1) | Concat |
Conv4_3(3-512-1) | CBAM module |
Max pooling | Conv8_1(1-64-1) |
Conv5_1(3-512-1) | Conv8_2(3-64-1) |
Conv5_2(3-512-1) | Upsampling |
Conv5_3(3-512-1) | Conv9_1(3-32-1) |
Conv9_2(3-32-1) | |
Conv10_1(1-1-1) |
方法 | Part_A | Part_B | ||
---|---|---|---|---|
MAE | RMSE | MAE | RMSE | |
MCNN[ | 110.2 | 173.2 | 26.4 | 41.3 |
CP-CNN[ | 73.6 | 106.4 | 20.1 | 30.1 |
CSRNet[ | 68.2 | 115.0 | 10.6 | 16.0 |
ASD[ | 65.6 | 98.0 | 8.5 | 13.7 |
PACNN[ | 66.3 | 106.4 | 8.9 | 13.5 |
SFANet[ | 59.8 | 99.3 | 6.9 | 10.9 |
DUBNet[ | 64.6 | 106.8 | 7.8 | 12.2 |
CSANet | 59.4 | 97.1 | 7.1 | 11.2 |
Table 2 Performance comparison of different methods on ShanghaiTech dataset
方法 | Part_A | Part_B | ||
---|---|---|---|---|
MAE | RMSE | MAE | RMSE | |
MCNN[ | 110.2 | 173.2 | 26.4 | 41.3 |
CP-CNN[ | 73.6 | 106.4 | 20.1 | 30.1 |
CSRNet[ | 68.2 | 115.0 | 10.6 | 16.0 |
ASD[ | 65.6 | 98.0 | 8.5 | 13.7 |
PACNN[ | 66.3 | 106.4 | 8.9 | 13.5 |
SFANet[ | 59.8 | 99.3 | 6.9 | 10.9 |
DUBNet[ | 64.6 | 106.8 | 7.8 | 12.2 |
CSANet | 59.4 | 97.1 | 7.1 | 11.2 |
方法 | MAE | RMSE |
---|---|---|
MCNN[ | 277.0 | 426.0 |
CMTL[ | 252.0 | 514.0 |
IA-DCCN[ | 125.0 | 186.0 |
RANet[ | 111.0 | 190.0 |
DUBNet[ | 105.6 | 180.5 |
CSANet | 104.8 | 175.4 |
Table 3 Performance comparison of different methods on UCF_QNRF dataset
方法 | MAE | RMSE |
---|---|---|
MCNN[ | 277.0 | 426.0 |
CMTL[ | 252.0 | 514.0 |
IA-DCCN[ | 125.0 | 186.0 |
RANet[ | 111.0 | 190.0 |
DUBNet[ | 105.6 | 180.5 |
CSANet | 104.8 | 175.4 |
方法 | MAE | RMSE |
---|---|---|
MCNN[ | 377.6 | 509.1 |
Switch-CNN[ | 318.1 | 439.2 |
CP-CNN[ | 295.8 | 320.9 |
CSRNet[ | 266.1 | 397.5 |
ASD[ | 196.2 | 270.9 |
PACNN[ | 267.9 | 357.8 |
CSANet | 191.3 | 262.8 |
Table 4 Performance comparison of different methodson UCF_CC_50 dataset
方法 | MAE | RMSE |
---|---|---|
MCNN[ | 377.6 | 509.1 |
Switch-CNN[ | 318.1 | 439.2 |
CP-CNN[ | 295.8 | 320.9 |
CSRNet[ | 266.1 | 397.5 |
ASD[ | 196.2 | 270.9 |
PACNN[ | 267.9 | 357.8 |
CSANet | 191.3 | 262.8 |
模块 | Part_A | Part_B | ||
---|---|---|---|---|
MAE | RMSE | MAE | RMSE | |
主干网络 | 60.9 | 99.2 | 7.7 | 12.2 |
主干网络+注意力 | 58.5 | 96.1 | 7.1 | 11.2 |
Table 5 Ablation study on ShanghaiTech dataset
模块 | Part_A | Part_B | ||
---|---|---|---|---|
MAE | RMSE | MAE | RMSE | |
主干网络 | 60.9 | 99.2 | 7.7 | 12.2 |
主干网络+注意力 | 58.5 | 96.1 | 7.1 | 11.2 |
[1] | SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition[EB/OL]. (2016-08-22)[2021-01-06]. https://arxiv.org/abs/1608.06197. |
[2] | HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition[C]// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, Jun 27-30, 2016. Washington: IEEE Computer Society, 2016: 770-778. |
[3] | HAN K, WANG Y H, TIAN Q, et al. GhostNet: more features from cheap operations[C]// Proceedings of the 2020 IEEE Conference Computer Vision and Pattern Recognition, Seattle, Jun 13-19, 2020. Piscataway: IEEE, 2020: 1580-1589. |
[4] | ZHANG C, LI H S, WANG X G, et al. Cross-scene crowd counting via deep convolutional neural networks[C]// Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition, Boston, Jun 7-12, 2015. Washington: IEEE Computer Society, 2015: 833-841. |
[5] | ZHANG Y Y, ZHOU D, CHEN S Q, et al. Single-image crowd counting via multi-column convolutional neural network[C]// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, Jun 27-30, 2016. Washington: IEEE Computer Society, 2016: 589-597. |
[6] | LIU N, LONG Y C, ZOU C Q, et al. ADCrowdNet: an attention-injective deformable convolutional network for crowd understanding[C]// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, Jun 16-20, 2019. Piscataway: IEEE, 2019: 3225- 3234. |
[7] |
VIOLA P, JONES M J. Robust real-time face detection[J]. International Journal of Computer Vision, 2004, 57(2): 137-154.
DOI URL |
[8] | DALAL N, TRIGGS B. Histograms of oriented gradients for human detection[C]// Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Diego, Jun 20-26, 2005. Washington: IEEE Computer Society, 2005: 886-893. |
[9] |
DOLLAR P, WOJEK C, SCHIELE B, et al. Pedestrian detection: an evaluation of the state of the art[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2011, 34(4): 743-761.
DOI URL |
[10] |
FELZENSZWALB P F, GIRSHICK R B, MCALLESTER D, et al. Object detection with discriminatively trained part-based models[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2009, 32(9): 1627-1645.
DOI URL |
[11] | CHEN K, GONG S G, XIANG T, et al. Cumulative attribute space for age and crowd density estimation[C]// Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition, Portland, Jun 23-28, 2013. Washington: IEEE Computer Society, 2013: 2467-2474. |
[12] | HOWARD A, SANDLER M, CHU G, et al. Searching for MobileNetV3[C]// Proceedings of the 2019 IEEE/CVF Inter-national Conference on Computer Vision, Seoul, Oct 27-Nov 2, 2019. Piscataway: IEEE, 2019: 1314-1324. |
[13] | REN S Q, HE K M, GIRSHICK R, et al. Faster R-CNN: towards real-time object detection with region proposal networks[EB/OL]. (2016-01-06) [2021-01-06]. https://arxiv.org/abs/1506.01497. |
[14] | BOCHKOVSKIY A, WANG C Y, LIAO H Y M. YOLOv4: optimal speed and accuracy of object detection[EB/OL]. (2020-04-23)[2021-01-06]. https://arxiv.org/abs/2004.10934. |
[15] | CHEN L C, ZHU Y, PAPANDREOU G, et al. Encoder-decoder with atrous separable convolution for semantic image segmentation[C]// LNCS 11211: Proceedings of the 15th European Conference on Computer Vision, Munich, Sep 8-14, 2018. Cham: Springer, 2018: 833-851. |
[16] | 余鹰, 朱慧琳, 钱进, 等. 基于深度学习的人群计数研究综述[J]. 计算机研究与发展, 2021, 58(12): 2724-2747. |
YU Y, ZHU H L, QIAN J, el al. Survey on deep learning based crowd counting[J]. Journal of Computer Research and Development, 2021, 58(12): 2724-2747. | |
[17] | GAO G S, GAO J Y, LIU Q J, et al. CNN-based density estimation and crowd counting: a survey[EB/OL]. (2020-03-28)[2021-01-06]. https://arxiv.org/abs/2003.12783. |
[18] |
YU Y, ZHU H L, WANG L W, et al. Dense crowd counting based on adaptive scene division[J]. International Journal of Machine Learning and Cybernetics, 2021, 12(4): 931-942.
DOI URL |
[19] | SINDAGI V A, PATEL V M. Generating high-quality crowd density maps using contextual pyramid CNNs[C]// Proceedings of the 2017 IEEE International Conference on Computer Vision, Venice, Oct 22-29, 2017. Washington: IEEE Computer Society, 2017: 1861-1870. |
[20] | SAM D B, SURYA S, BABU R V. Switching convolutional neural network for crowd counting[C]// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, Jul 21-26, 2017. Washington: IEEE Computer Society, 2017: 4031-4039. |
[21] | CAO X K, WANG Z P, ZHAO Y Y, et al. Scale aggregation network for accurate and efficient crowd counting[C]// LNCS 11209: Proceedings of the 15th European Conference on Computer Vision, Munich, Sep 8-14, 2018. Cham: Springer, 2018: 734-750. |
[22] | LI Y H, ZHANG X F, CHEN D M. CSRNet: dilated convo-lutional neural networks for understanding the highly congested scenes[C]// Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, Jun 18-22, 2018. Washington: IEEE Computer Society, 2018: 1091-1100. |
[23] | WOO S, PARK J, LEE J Y, et al. CBAM: convolutional block attention module[C]// LNCS 11211: Proceedings of the 15th European Conference on Computer Vision, Munich, Sep 8-14, 2018. Cham: Springer, 2018: 3-19. |
[24] | DENG J, DONG W, SOCHER R, et al. ImageNet: a large-scale hierarchical image database[C]// Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, Jun 20-25, 2009. Washington: IEEE Computer Society, 2009: 248-255. |
[25] | WU X J, ZHENG Y B, YE H, et al. Adaptive scenario discovery for crowd counting[C]// Proceedings of the 2019 IEEE International Conference on Acoustics, Speech and Signal Processing, Brighton, May 12-17, 2019. Piscataway: IEEE, 2019: 2382-2386. |
[26] | SHI M J, YANG Z H, XU C, et al. Revisiting perspective information for efficient crowd counting[C]// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, Jun 16-20, 2019. Piscataway: IEEE, 2019: 7279-7288. |
[27] | ZHU L, ZHAO Z J, LU C, et al. Dual path multi-scale fusion networks with attention for crowd counting[EB/OL]. (2019-02-04)[2021-01-06]. https://arxiv.org/abs/1902.01115. |
[28] | OH M, OLSEN P, RAMAMURTHY K N. Crowd counting with decomposed uncertainty[C]// Proceedings of the 2020 AAAI Conference on Artificial Intelligepnce, New York, Feb 7-12, 2020. Menlo Park: AAAI Press, 2020: 11799-11806. |
[29] | IDREES H, TAYYAB M, ATHREY K, et al. Composition loss for counting, density map estimation and localization in dense crowds[C]// LNCS 11206: Proceedings of the 15th European Conference on Computer Vision, Munich, Sep 8-14, 2018. Cham: Springer, 2018: 544-559. |
[30] | SINDAGI V A, PATEL V M. CNN-based cascaded multi-task learning of high-level prior and density estimation for crowd counting[C]// Proceedings of the 14th IEEE International Conference on Advanced Video and Signal Based Surve-illance, Lecce, Aug 29-Sep 1, 2017. Washington: IEEE Computer Society, 2017: 1-6. |
[31] | SINDAGI V A, PATEL V M. Inverse attention guided deep crowd counting network[C]// Proceedings of the 16th IEEE International Conference on Advanced Video and Signal Based Surveillance, Taipei, China, Sep 18-21, 2019. Piscataway: IEEE, 2019: 1-8. |
[32] | ZHANG A R, SHEN J Y, XIAO Z H, et al. Relational attention network for crowd counting[C]// Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, Seoul, Oct 27 -Nov 2, 2019. Piscataway: IEEE, 2019: 6788-6797. |
[33] | IDREES H, SALEEMI I, SEIBERT C, et al. Multi-source multi-scale counting in extremely dense crowd images[C]// Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition, Portland, Jun 23-28, 2013. Washington: IEEE Computer Society, 2013: 2547-2554. |
[1] | LYU Xiaoqi, JI Ke, CHEN Zhenxiang, SUN Runyuan, MA Kun, WU Jun, LI Yidong. Expert Recommendation Algorithm Combining Attention and Recurrent Neural Network [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(9): 2068-2077. |
[2] | LI Zhenqi, WANG Jing, JIA Ziyu, LIN Youfang. Attention-Based Multi-dimensional Feature Graph Convolutional Network for Motor Imagery Classification [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(9): 2050-2060. |
[3] | ZHANG Xiangping, LIU Jianxun. Overview of Deep Learning-Based Code Representation and Its Applications [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(9): 2011-2029. |
[4] | LI Dongmei, LUO Sisi, ZHANG Xiaoping, XU Fu. Review on Named Entity Recognition [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(9): 1954-1968. |
[5] | REN Ning, FU Yan, WU Yanxia, LIANG Pengju, HAN Xi. Review of Research on Imbalance Problem in Deep Learning Applied to Object Detection [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(9): 1933-1953. |
[6] | YANG Caidong, LI Chengyang, LI Zhongbo, XIE Yongqiang, SUN Fangwei, QI Jin. Review of Image Super-resolution Reconstruction Algorithms Based on Deep Learning [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(9): 1990-2010. |
[7] | ZENG Fanzhi, XU Luqian, ZHOU Yan, ZHOU Yuexia, LIAO Junwei. Review of Knowledge Tracing Model for Intelligent Education [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(8): 1742-1763. |
[8] | AN Fengping, LI Xiaowei, CAO Xiang. Medical Image Classification Algorithm Based on Weight Initialization-Sliding Window CNN [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(8): 1885-1897. |
[9] | YANG Zhiqiao, ZHANG Ying, WANG Xinjie, ZHANG Dongbo, WANG Yu. Application Research of Improved U-shaped Network in Detection of Retinopathy [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(8): 1877-1884. |
[10] | XIA Hongbin, XIAO Yifei, LIU Yuan. Long Text Generation Adversarial Network Model with Self-Attention Mechanism [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(7): 1603-1610. |
[11] | PENG Hao, LI Xiaoming. Multi-scale Selection Pyramid Networks for Small-Sample Target Detection Algorithms [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(7): 1649-1660. |
[12] | LIU Yi, LI Mengmeng, ZHENG Qibin, QIN Wei, REN Xiaoguang. Survey on Video Object Tracking Algorithms [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(7): 1504-1515. |
[13] | ZHAO Xiaoming, YANG Yijiao, ZHANG Shiqing. Survey of Deep Learning Based Multimodal Emotion Recognition [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(7): 1479-1503. |
[14] | SUN Fangwei, LI Chengyang, XIE Yongqiang, LI Zhongbo, YANG Caidong, QI Jin. Review of Deep Learning Applied to Occluded Object Detection [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(6): 1243-1259. |
[15] | LIU Yafen, ZHENG Yifeng, JIANG Lingyi, LI Guohe, ZHANG Wenjie. Survey on Pseudo-Labeling Methods in Deep Semi-supervised Learning [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(6): 1279-1290. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||
/D:/magtech/JO/Jwk3_kxyts/WEB-INF/classes/