Norm-DP模型行人检测优化算法

doi:10.3778/j.issn.1673-9418.2005001

摘要/Abstract

摘要：

传统深度金字塔模型作为一种有效的行人检测算法备受关注，融合可变形部件模型和卷积神经网络模型，但特征提取部分使用的算法像素区域的大小不同，导致模型之间不能完全融合，在行人数量多、姿势复杂和有遮挡情况时的检测效果不理想。因此，提出一种基于规范化函数的深度金字塔模型（Norm-DP）算法，使用规范化函数融合可变形部件模型和卷积神经网络模型，直接从金字塔特征中提取正负样本，使用隐变量支持向量机进行模型训练，结合柔性非最大抑制（soft-NMS）算法和边界框回归（BBR）算法对定位框进行优化。分别使用INRIA和MS COCO数据集进行实验验证，在行人数量多、姿势复杂和有遮挡情况时，检测精度高于最优的可变形部件模型算法、卷积神经网络算法、深度金字塔模型算法和结合区域选择的卷积神经网络算法。

关键词: 卷积神经网络（CNN）, 可变形部件模型算法, 规范化深度金字塔（Norm-DP）, 柔性非最大抑制（Soft-NMS）, 边界框回归（BBR）

Abstract:

The traditional deep pyramid model attracts much attention as an effective pedestrian detection algorithm. It combines deformable part model and convolutional neural network model. However, the algorithm adopted in the feature extraction section has different pixel area sizes, so the models cannot be fully fused. The detection result is not ideal when it comes to the situation with a large number of pedestrians, complex postures, and occlusions. Therefore, a deep pyramid model algorithm based on normalization function (Norm-DP) is proposed in this paper. This algorithm combines the deformable part model and the convolutional neural network model, which extracts positive and negative samples directly from the pyramid features. Model training is then conducted on a latent variable support vector machine. The positioning frame is optimized through soft-non-maximum suppression (soft-NMS) algorithm and bounding box regression (BBR) algorithm. Experimental verification is performed on INRIA and MS COCO datasets. As a result, the detection accuracy of the proposed algorithm is higher than the optimal deformable part model algorithm, convolutional neural network algorithm, deep pyramid model algorithm and convolutional neural network algorithm combined with region selection in the situation with many pedestrians, complex postures and occlusions.

Key words: convolutional neural network (CNN), deformable part model algorithm, normalization deep pyramid (Norm-DP), soft-non-maximum suppression (Soft-NMS), bounding box regression (BBR)

柴恩惠, 马占飞, 智敏. Norm-DP模型行人检测优化算法[J]. 计算机科学与探索, 2021, 15(3): 545-552.

CHAI Enhui, MA Zhanfei, ZHI Min. Optimized Pedestrian Detection Algorithm for Norm-DP Model[J]. Journal of Frontiers of Computer Science and Technology, 2021, 15(3): 545-552.

参考文献

[1] HINTON G, SALAKHUTDINOV R. Reducing the dimensionality of data with neural networks[J]. Science, 2006, 313(5786): 504-507.
[2] LECUN Y, BENGIO Y, HINTON G. Deep learning[J]. Nature, 2015, 521(7553): 436-444.
[3] FELZENSZWALB P F, MCALLESTER D A, RAMANAN D. A discriminatively trained, multiscale, deformable part model[C]//Proceedings of the 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Anchorage, Jun 24-26, 2008. Washington: IEEE Computer Society, 2008: 1-8.
[4] ROWLEY H, BALUJA S, KANADE T. Neural network based face detection[J]. IEEE Transactions on Pattern Ana-lysis and Machine Intelligence, 1998, 20(1): 23-38.
[5] DAI J F, QI H Z, XIONG Y W, et al. Deformable convolutional networks[C]//Proceedings of the 2017 IEEE International Conference on Computer Vision, Venice, Oct 22-29, 2017. Washington: IEEE Computer Society, 2017: 764-773.
[6] DALAL N, TRIGGS B. Histograms of oriented gradients for human detection[C]//Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Diego, Jun 20-26, 2005. Washington: IEEE Computer Society, 2005: 886-893.
[7] OUYANG W L, WANG X G. Joint deep learning for pedestrian detection[C]//Proceedings of the 2013 IEEE International Conference on Computer Vision, Sydney, Dec 1-8, 2013. Washington: IEEE Computer Society, 2013: 2056-2063.
[8] GAO Q Y, FANG H S. HOG pedestrian detection algorithm of multiple convolution feature fusion[J]. Computer Science, 2017, 44(Z2): 199-201.
高琦煜，方虎生. 多卷积特征融合的HOG行人检测算法[J]. 计算机科学, 2017, 44(Z2): 199-201.
[9] OUYANG W L, ZENG X Y, WANG X G, et al. DeepID-Net: object detection with deformable part based convolutional neural networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(7): 1320-1334.
[10] MORDAN T, THOME N, HéNAFF G, et al. Deformable part-based fully convolutional network for object detection [C]//Proceedings of the British Machine Vision Conference 2017, London, Sep 4-7, 2017. Durham: BMVA Press, 2017: 1-14.
[11] GIRSHICK R B, IANDOLA F N, DARRELL T, et al. Deformable part models are convolutional neural networks [C]//Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition, Boston, Jun 7-12, 2015. Washington: IEEE Computer Society, 2015: 437-446.
[12] FUNG G, MANGASARIAN O L. Finite Newton method for Lagrangian support vector machine classification[J]. Neurocomputing, 2003, 55(1/2): 39-55.
[13] BODLA N, SINGH B, CHELLAPPA R, et al. Soft-NMS-improving object detection with one line of code[C]//Proceedings of the 2017 IEEE International Conference on Computer Vision, Venice, Oct 22-29, 2017. Washington: IEEE Computer Society, 2017: 5562-5570.
[14] HE Y H, ZHU C C, WANG J R, et al. Bounding box regression with uncertainty for accurate object detection[C]//Proceedings of the 2019 IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, Jun 16-20, 2019. Washington: IEEE Computer Society, 2019: 2883-2892.
[15] GIRSHICK R B, DONAHUA J, DARRELL T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]//Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, Columbus, Jun 23-28, 2014. Washington: IEEE Computer Society, 2014: 580-587.