Counting Method Based on Density Graph Regression and Object Detection

doi:10.3778/j.issn.1673-9418.2209065

Abstract

Abstract: In response to the low recall rate of detection-based methods and the problem of missing target location information in density-based methods, which are the two mainstream dense-counting methods, a detection and counting method based on density map regression is proposed by combining the two tasks, achieving the counting and positioning of target objects in dense scenes. Complementing the advantages of two methods not only improves recall rate but also calibrates all targets. To extract richer feature information to deal with complex data scenarios, a feature pyramid optimization module is proposed, which vertically fuses low-level high-resolution features with top-level abstract semantic features and horizontally fuses same-size features to enrich the semantic expression of target objects. To address the issue of low pixel proportions occupied by target objects in dense counting scenarios, an attention mechanism for small targets is proposed to improve the network’s detection sensitivity, which can enhance the attention of the network to target objects by constructing a mask on the input image. Experimental results demonstrate that the proposed method significantly improves recall rate and accurately locates targets while maintaining accuracy, effectively providing counting and positioning information of input image, which has a wide range of application prospects in various fields such as industry and ecology.

Key words: intensive count, target detection, deep learning, density map regression, feature pyramid

摘要： 针对基于检测以及基于密度图两种主流的密集计数方法中，基于检测的方法召回率较低、基于密度图的方法缺失目标物体位置信息的问题，将检测任务与回归任务相结合后提出一种基于密度图回归的检测计数方法，可以实现对密集场景中目标物体的计数以及定位，对两种方法进行优势互补，在提高召回率的同时，实现标定所有目标物体的位置信息。为提取出更加丰富的特征信息以面对复杂的数据场景，网络提出特征金字塔优化模块，该模块纵向融合底层高分辨特征与顶层抽象语义特征，横向融合同尺寸的特征，丰富目标物体的语义表达；考虑到密集计数场景中目标物体所占像素比例较低的问题，提出一种针对小目标的注意力机制，通过对输入图像构建掩膜以增强网络对目标物体的注意力，从而提高网络的检测敏感性。实验结果表明，所提出方法在保持准确率基本不变的情况下，大幅度提高了召回率，同时可准确标定目标物体位置，有效提供输入目标图像的计数以及定位信息，在工业以及生态等各种领域具有广泛的应用前景。

关键词: 密集计数, 目标检测, 深度学习, 密度图回归, 特征金字塔

GAO Jie, ZHAO Xinxin, YU Jian, XU Tianyi, PAN Li, YANG Jun, YU Mei, LI Xuewei. Counting Method Based on Density Graph Regression and Object Detection[J]. Journal of Frontiers of Computer Science and Technology, 2024, 18(1): 127-137.

高洁, 赵心馨, 于健, 徐天一, 潘丽, 杨珺, 喻梅, 李雪威. 结合密度图回归与检测的密集计数研究[J]. 计算机科学与探索, 2024, 18(1): 127-137.

References

[1] LI M, ZHANG Z X, HUANG K Q, et al. Estimating the number of people in crowded scenes by MID based foreground segmentation and head-shoulder detection[C]//Procceedings of the 19th International Conference on Pattern Recognition, Tampa, Dec 8-11, 2008. Piscataway: IEEE, 2008: 1-4.
[2] 赵宏伟, 徐亮, 王冶, 等. 基于尺度融合的密集人群计数[J]. 计算机系统应用, 2021, 30(10): 1-11.
ZHAO H W, XU L, WANG Y, et al. Crowd counting based on scale fusion[J]. Computer Systems and Applications, 2021, 30(10): 1-11.
[3] WAN J, CHAN A. Modeling noisy annotations for crowd counting[C]//Advances in Neural Information Processing Systems 33,?Dec?6-12,?2020: 3386-3396.
[4] CHAN A B, LIANG Z S J, VASCONCELOS N. Privacy pres-erving crowd monitoring: counting people without people models or tracking[C]//Procceedings of the 2008 IEEE Conference on Computer Vision and Pattern Recognition, Anchorage, Jun 23-28, 2008. Piscataway: IEEE, 2008: 1-7.
[5] 李鹏博, 王向文. 基于深度特征融合生成的密集人群计数网络[J]. 计算机应用与软件, 2021, 38(3): 154-158.
LI P B, WANG X W. Dense crowd counting network based on depth feature fusion[J]. Computer Aplications and Software, 2021, 38(3): 154-158.
[6] 郭濠奇. 基于深度学习的人群计数算法研究[D]. 赣州: 江西理工大学, 2021.
GUO H Q. A crowd counting algorithm based on deep learning[D]. Ganzhou: Jiangxi University of Science and Technology, 2021.
[7] FELZENSZWALB P F, GIRSHICK R B, MCALLESTER D, et al. Object detection with discriminatively trained part-based models[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2009, 32(9): 1627-1645.
[8] GAO G, LIU Q, WANG Y. Counting dense objects in remote sensing images[C]//Procceedings of the 2020 IEEE Intern-ational Conference on Acoustics, Speech and Signal Proce-ssing, Barcelona, May 4-8, 2020: 4137-4141.
[9] ONORO-RUBIO D, L’OPEZ-SASTRE R J. Towards pers-pective-free object counting with deep learning[C]//LNCS 9906: Procceedings of the 14th European Conference on Computer Vision. Cham: Springer, 2016: 615-629.
[10] YANG Y, LI G, WU Z, et al. Reverse perspective network for perspective-aware object counting[C]//Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun 16-18, 2020: 4374-4383.
[11] LEMPITSKY V, ZISSERMAN A. Learning to count objects in images[C]//LAFFERTY J, WILLIAMS C, SHAWE J, et al. [C]//Advances in Neural Information Processing Systems 23, Vancouver, Dec 6-9, 2010: 1324-1332.
[12] 刘丽, 匡纲要. 图像纹理特征提取方法综述[J]. 中国图象图形学报, 2009, 14(4): 622-635.
LIU L, KUANG G Y. Overview of image textural feature extraction methods[J]. Journal of Image and Graphics,2009, 14(4): 622-635.
[13] 景林, 林耀海, 温永仙, 等. 结合色彩特征和空域特征的成捆原木轮廓识别[J]. 计算机系统应用, 2013, 22(7): 196-199.
JING L, LIN Y H, WEN Y X, et al. Method for outline identification of bundled logs based upon color and spatial features[J]. Computer Systems and Applications, 2013, 22(7): 196-199.
[14] 王昱棠. 基于视觉检测的仓储物料计数方法研究[J].计算机工程与设计, 2014, 35(7): 2598-2601.
WANG Y T. Research on warehouse material automatic counting method based on visual inspection[J]. Computer Engineering and Design, 2014, 35(7): 2598-2601.
[15] 龙德帆, 樊尚春, 庞宏冰. 用于原木材积检测的图像处理与分析算法[J]. 北京航空航天大学学报, 2005(1): 82-85.
LONG D F, FANG S C, PANG H B. Image processing and analysis algorithms for measuring log volume[J]. Journal of Beijing University of Aeronauics and Astronautics, 2005(1): 82-85.
[16] DALAL N, TRIGGS B. Histograms of oriented gradients for human detection[C]//Procceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Diego, Jun 20-25, 2005. Piscataway: IEEE, 2005: 886-893.
[17] LEIBE B, SEEMANN E, SCHIELE B. Pedestrian detection in crowded scenes[C]//Procceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Diego, Jun 20-25, 2005. Piscat-away: IEEE, 2005: 878-885.
[18] NOBLE W S. What is a support vector machine?[J]. Nature Biotechnology, 2006, 24(12): 1565-1567.
[19] WALACH E, WOLF L. Learning to count with CNN boos-ting[C]//LNCS 9906: Procceedings of the 14th European Conference on Computer Vision. Cham: Springer, 2016: 660-676.
[20] 吕红燕, 冯倩. 随机森林算法研究综述[J]. 河北省科学院学报, 2019, 36(3): 37-41.
LV H Y, FENG Q. A review of random forests algorithm[J]. Journal of the Hebei Academy of Sciences, 2019, 36(3): 37-41.
[21] HE K, GKIOXARI G, DOLLAR P, et al. Mask R-CNN[C]//Proceedings of the 2017 IEEE International Conference on Computer Vision, Venice, Oct 22-29, 2017: 2961-2969.
[22] LIU W, ANGUELOV D, ERHAN D, et al. SSD: single shot multibox detector[C]//LNCS 9906: Procceedings of the 14th European Conference on Computer Vision. Cham: Springer, 2016: 21-37.
[23] REDMON J, DIVVALA S, GIRSHICK R, et al. You only look once: unified, real-time object detection[C]//Proceed-ings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, Jun 27-30, 2016: 779-788.
[24] LIN T Y, GOYAL P, GIRSHICK R, et al. Focal loss for dense object detection[C]//Proceedings of the 2017 IEEE International Conference on Computer Vision, Venice, Oct 22-29, 2017: 2999-3007.
[25] TAN M X, PANG R M, LE Q V. EfficientDet: scalable and efficient object detection[C]//Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Reco-gnition, Seattle, Jun 16-18, 2020: 10778-10787.
[26] DUAN K, BAI S, XIE L, et al. CenterNet: keypoint triplets for object detection[C]//Proceedings of the 2019 IEEE International Conference on Computer Vision, Seoul, Oct 27-Nov 2, 2019: 6569-6578.
[27] TIAN Z, SHEN C, CHEN H, et al. Fcos: fully convolu-tional one-stage object detection[C]//Proceedings of the 2019 IEEE International Conference on Computer Vision, Seoul, Oct 27-Nov 2, 2019: 9627-9636.
[28] CARION N, MASSA F, SYNNAEVE G, et al. End-to-end object detection with transformers[C]//LNCS 12346: Proce-edings of the 16th European Conference on Computer Vision. Cham: Springer, 2020: 213-229.
[29] LIU Z, LIN Y, CAO Y, et al. Swin transformer: hierarchical vision transformer using shifted windows[C]//Proceedings of the 2021 IEEE/CVF International Conference on Comp-uter Vision. Piscataway: IEEE, 2021: 10012-10022.
[30] EVERINGHAM M, VAN GOOL L, WILLIAMS C K, et al. The pascal visual object classes(VOC) challenge[J]. Intern-ational Journal of Computer Vision, 2010, 88(2): 303-338.
[31] LIN T Y, MAIRE M, BELONGIE S, et al. Microsoft COCO: common objects in context[C]//LNCS 8693: Proce-edings of the 13th European Conference on Computer Vision. Cham: Springer, 2014: 740-755.
[32] 姬丽娜, 陈庆奎, 陈圆金, 等. 基于GPU的视频流人群实时计数[J]. 计算机应用, 2017, 37(1): 145-152.
JI L N, CHEN Q K, CHEN Y J, et al. Real time crowd counting method from video stream based on GPU[J]. Journal of Computer Applications, 2017, 37(1): 145-152.
[33] 杜培德, 严华. 基于多尺度空间注意力特征融合的人群计数网络[J]. 计算机应用, 2021, 41(2): 537-543.
DU P D, YAN H. Crowd counting network based on multi-scale spatial attention feature fusion[J]. Journal of Comp-uter Applications, 2021, 41(2): 537-543.
[34] CHAN A B, NUNO V A. Bayesian poisson regression for crowd counting[C]//Procceedings of the 2009 IEEE 12th International Conference on Computer Vision, Kyoto, Sep 29-Oct 2, 2009: 545-551.
[35] SHELHAMER E, LONG J, DARRELL T. Fully convol-utional networks for semantic segmentation[J]. IEEE Trans-actions on Pattern Analysis and Machine Intelligence, 2017, 39(4): 640-651.
[36] IDREES H, SALEEMI I, SEIBERT C, et al. Multi-source multi-scale counting in extremely dense crowd images[C]//Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition, Portland, Jun 23-28, 2013: 2547-2554.
[37] SAM D B, SURYA S, BABU R V. Switching convolutional neural network for crowd counting[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, Jul 21-26, 2017: 4031-4039.
[38] LI Y, ZHANG X, CHEN D. CSRNet: dilated convolutional neural networks for understanding the highly congested scenes[C]//Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, Jun 18-22, 2018: 1091-1100.
[39] KANG D, ANTONI C. Crowd counting by adaptively fusing predictions from an image pyramid[J]. arXiv:1805. 06115, 2018.
[40] VIRESH R, LE H, HOAI M. Iterative crowd counting[C]//LNCS 11211: Proceedings of the 15th European Confe-rence on Computer Vision, Munich, Sep 8-14, 2018. Cham: Springer, 2018: 278-293.
[41] SAM D B, BABU R V. Top-down feedback for crowd coun-ting convolutional neural network[C]//Procceedings of the 32nd AAAI Conference on Artificial Intelligence, New Orleans, Feb 2-7, 2018: 7323-7330.
[42] SHI Z, ZHANG L, LIU Y, et al. Crowd counting with deep negative correlation learning[C]//Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Reco-gnition, Salt Lake City, Jun 19-21, 2018: 5382-5390.
[43] ZHANG Y Y, ZHOU D, CHEN S, et al. Single-image crowd counting via multi-column convolutional neural network[C]//Proceedings of the 2016 IEEE Conference on Comp-uter Vision and Pattern Recognition, Las Vegas, Jun 27-30, 2016: 589-597.
[44] LARADJI I H, ROSTAMZADEH N, PINHEIRO P O, et al. Where are the blobs: counting by localization with point supervision[C]//LNCS 11211: Proceedings of the 15th European Conference on Computer Vision, Munich, Sep 8-14, 2018. Cham: Springer, 2018: 547-562.
[45] IDREES H, TAYYAB M, ATHREY K, et al. Composition loss for counting, density map estimation and localization in dense crowds[C]//LNCS 11211: Proceedings of the 15th European Conference on Computer Vision, Munich, Sep 8-14, 2018. Cham: Springer, 2018: 532-546.
[46] LIN T Y, DOLLAR P, GIRSHICK R, et al. Feature pyramid networks for object detection[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, Jul 21-26, 2017: 936-944.
[47] GAO J, LIN W, ZHAO B, et al. C3 framework: an open-source Pytorch code for crowd counting[J]. arXiv:1907. 02724, 2019.