深度语义分割人群密度检测技术

doi:10.3778/j.issn.1673-9418.2005066

摘要/Abstract

摘要：

随着社会的发展，人们大量外出导致拥挤场景越来越多，对人群密度的检测就显得尤为重要。针对人群中由于摄像机视角引起的人与人尺度不一的多尺度问题，提出了一种基于深度语义分割的人群密度检测方法。网络前端采用改进的VGG网络对人群特征进行提取，使输出的特征图为原图1/8以提高预测密度图的准确性，后端设计了两阵列扩张率不同的空洞卷积模块来捕捉人群的多尺度特征，使得网络能够捕捉更多的尺度细节及边缘信息。网络最后使用1×1的卷积对输出进行级联，得到高质量预测密度图。同时，为解决空洞卷积带来栅格效应，设计了锯齿状网络结构，使补零后的卷积操作中每一个像素都进行计算来保证信息的连续性，以此来提高网络的准确性。分别在ShanghaiTech和UCF_CC_50数据集上对网络性能进行了测试，测试结果优于目前主流的人群密度检测方法，测试所得的MAE值相较于MCNN网络提高了42.4%和38.1%，相较于SANet网络提高了5.3%和9.6%。

关键词: 人群密度检测, 深度学习, 空洞卷积, 密度图, 锯齿状网络

Abstract:

With the development of society, people are going out more and more, which leads to more and more crowded scenes. The detection of crowd density is particularly important. Aiming at the multi-scale problem of different human scales caused by camera angles in the crowd, a crowd density detection method based on deep semantic segmentation is proposed. The front-end of the network uses an improved VGG network to extract the crowd characteristics, so that the output feature map is 1/8 of the original image to improve the accuracy of the predicted density map. The back-end designs two atrous convolution modules with different array dilation rates to capture the multi-scale features of the crowd. The multi-scale features of the network enable the network to capture more scale details and edge information. The network finally uses 1×1 convolution to cascade the output to obtain a high-quality prediction density map. At the same time, in order to solve the grid effect caused by atrous convolution, a zigzag network structure is designed, so that every pixel in the convolution operation after zero filling is calculated to ensure the continuity of the information, thereby improving the accuracy of the network. The network performance is tested on the ShanghaiTech and UCF_CC_50 datasets. The test results are better than the current mainstream crowd density detection methods. The MAE value obtained from the test is 42.4% and 38.1% higher than that of the MCNN network. Compared with SANet, network performance is increased by 5.3% and 9.6%.

Key words: crowd density detection, deep learning, atrous convolution, density map, zigzag network

马煜, 杜慧敏, 毛智礼, 张霞. 深度语义分割人群密度检测技术[J]. 计算机科学与探索, 2021, 15(8): 1469-1475.

MA Yu, DU Huimin, MAO Zhili, ZHANG Xia. Crowd Density Detection Technology Based on Deep Semantic Segmentation[J]. Journal of Frontiers of Computer Science and Technology, 2021, 15(8): 1469-1475.

参考文献

[1] ZHANG Y Y, ZHOU D S, CHEN S Q, et al. Single-image crowd counting via multi-column convolutional neural network[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, Jun 27-30, 2016. Piscataway: IEEE, 2016: 589-597.
[2] BOOMINATHAN L, KRUTHIVENTI S S S, BABU R V. CrowdNet: a deep convolutional network for dense crowd counting[C]//Proceedings of the 2016 ACM Conference on Multimedia Conference, Amsterdam, Oct 15-19, 2016. New York: ACM, 2016: 640-644.
[3] SAM D B, SURYA S, BABU R V. Switching convolutional neural network for crowd counting[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, Jul 21-26, 2017. Piscataway: IEEE, 2017: 4031-4039.
[4] ZHAO X Y. U-GAnet: multi-channel feature reconstruction of population density detection model[J]. Computer Know-ledge and Technology, 2019, 15(35): 197-200.
赵新宇. U-GAnet多通道特征重构人群密度检测模型[J]. 电脑知识与技术, 2019, 15(35): 197-200.
[5] SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition[C]//Proceedings of the 3rd International Conference on Learning Representations, San Diego, May 7-9, 2015: 1-14.
[6] HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, Jun 27-30, 2016. Piscataway: IEEE, 2016: 770-778.
[7] IDREES H, SALEEMI I, SEIBERT C, et al. Multi-source multi-scale counting in extremely dense crowd images[C]//Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition, Portland, Jun 23-28, 2013. Piscataway: IEEE, 2013: 2547-2554.
[8] SHI M J, YANG Z H, XU C, et al. Revisiting perspective information for efficient crowd counting[C]//Proceedings of the 2019 IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, Jun 16-20, 2019. Piscataway: IEEE, 2019: 7271-7280.
[9] LI Y H, ZHANG X F, CHEN D M. CSRNet: dilated convolutional neural networks for understanding the highly congested scenes[C]//Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, Jun 18-22, 2018. Piscataway: IEEE, 2018: 1091-1100.
[10] CAO X K, WANG Z P, ZHAO Y Y, et al. Scale aggregation network for accurate and efficient crowd counting[C]//LNCS 11209: Proceedings of the 15th European Conference on Computer Vision, Munich, Sep 8-14, 2018. Berlin, Heidelberg: Springer, 2018: 757-773.
[11] SINDAGI V A, PATEL V M. Inverse attention guided deep crowd counting network[C]//Proceedings of the 16th IEEE International Conference on Advanced Video and Signal Based Surveillance, Taipei, China, Sep 18-21, 2019. Piscataway: IEEE, 2019: 1-8.
[12] ZHANG C, LI H S, WANG X G, et al. Cross-scene crowd counting via deep convolutional neural networks[C]//Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition, Boston, Jun 7-12, 2015. Piscataway: IEEE, 2015: 833-841.
[13] LIU L B, WANG H J, LI G B, et al. Crowd counting using deep recurrent spatial-aware network[C]//Proceedings of the 27th International Joint Conference on Artificial Intelligence, Stockholm, Jul 13-19, 2018: 849-855.
[14] WANG Q, GAO J Y, LIN W, et al. Learning from synthetic data for crowd counting in the wild[C]//Proceedings of the 2019 IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, Jun 16-20, 2019. Piscataway: IEEE, 2019: 8198-8207.
[15] YU F, KOLTUN V. Multi-scale context aggregation by dilated convolutions[C]//Proceedings of the 4th International Conference on Learning Representations, San Juan, May 2-4, 2016: 1-10.
[16] YU F, KOLTUN V, FUNKHOUSER T A. Dilated residual networks[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, Jul 21-26, 2017. Piscataway: IEEE, 2017: 636-644.