计算机科学与探索 ›› 2021, Vol. 15 ›› Issue (8): 1469-1475.DOI: 10.3778/j.issn.1673-9418.2005066

• 人工智能 • 上一篇    下一篇

深度语义分割人群密度检测技术

马煜,杜慧敏,毛智礼,张霞   

  1. 西安邮电大学 电子工程学院,西安 710121
  • 出版日期:2021-08-01 发布日期:2021-08-02

Crowd Density Detection Technology Based on Deep Semantic Segmentation

MA Yu, DU Huimin, MAO Zhili, ZHANG Xia   

  1. School of Electronic Engineering, Xi'an University of Posts & Telecommunications, Xi'an 710121, China
  • Online:2021-08-01 Published:2021-08-02

摘要:

随着社会的发展,人们大量外出导致拥挤场景越来越多,对人群密度的检测就显得尤为重要。针对人群中由于摄像机视角引起的人与人尺度不一的多尺度问题,提出了一种基于深度语义分割的人群密度检测方法。网络前端采用改进的VGG网络对人群特征进行提取,使输出的特征图为原图1/8以提高预测密度图的准确性,后端设计了两阵列扩张率不同的空洞卷积模块来捕捉人群的多尺度特征,使得网络能够捕捉更多的尺度细节及边缘信息。网络最后使用1×1的卷积对输出进行级联,得到高质量预测密度图。同时,为解决空洞卷积带来栅格效应,设计了锯齿状网络结构,使补零后的卷积操作中每一个像素都进行计算来保证信息的连续性,以此来提高网络的准确性。分别在ShanghaiTech和UCF_CC_50数据集上对网络性能进行了测试,测试结果优于目前主流的人群密度检测方法,测试所得的MAE值相较于MCNN网络提高了42.4%和38.1%,相较于SANet网络提高了5.3%和9.6%。

关键词: 人群密度检测, 深度学习, 空洞卷积, 密度图, 锯齿状网络

Abstract:

With the development of society, people are going out more and more, which leads to more and more crowded scenes. The detection of crowd density is particularly important. Aiming at the multi-scale problem of different human scales caused by camera angles in the crowd, a crowd density detection method based on deep semantic segmentation is proposed. The front-end of the network uses an improved VGG network to extract the crowd characteristics, so that the output feature map is 1/8 of the original image to improve the accuracy of the predicted density map. The back-end designs two atrous convolution modules with different array dilation rates to capture the multi-scale features of the crowd. The multi-scale features of the network enable the network to capture more scale details and edge information. The network finally uses 1×1 convolution to cascade the output to obtain a high-quality prediction density map. At the same time, in order to solve the grid effect caused by atrous convolution, a zigzag network structure is designed, so that every pixel in the convolution operation after zero filling is calculated to ensure the continuity of the information, thereby improving the accuracy of the network. The network performance is tested on the ShanghaiTech and UCF_CC_50 datasets. The test results are better than the current mainstream crowd density detection methods. The MAE value obtained from the test is 42.4% and 38.1% higher than that of the MCNN network. Compared with SANet, network performance is increased by 5.3% and 9.6%.

Key words: crowd density detection, deep learning, atrous convolution, density map, zigzag network