计算机科学与探索 ›› 2021, Vol. 15 ›› Issue (1): 163-172.DOI: 10.3778/j.issn.1673-9418.2003008

• 图形图像 • 上一篇    下一篇

改进YOLOV3算法的视频目标检测

宋艳艳,谭励,马子豪,任雪平   

  1. 北京工商大学 计算机与信息工程学院,北京 100048
  • 出版日期:2021-01-01 发布日期:2021-01-07

Video Target Detection Based on Improved YOLOV3 Algorithm

SONG Yanyan, TAN Li, MA Zihao, REN Xueping   

  1. School of Computer and Information Engineering, Beijing Technology and Business University, Beijing 100048, China
  • Online:2021-01-01 Published:2021-01-07

摘要:

由于监控中的行人检测存在背景复杂,目标尺度和姿态多样性及人与周围物体互相遮挡的问题,造成YOLOV3对部分目标检测不准确,会产生误检、漏检或重复检测的情况。因此,在YOLOV3的网络基础上,利用残差结构思想,将浅层特征和深层特征进行上采样连接融合得到104×104尺度检测层,并将K-means算法聚类得到的边界框尺寸应用到各尺度网络层,增加网络对多尺度、多姿态目标的敏感度,提高检测效果。同时,利用预测框对周围其他目标的斥力损失更新YOLOV3损失函数,使预测框向正确的目标靠近,远离错误的目标,降低模型的误检率,以改善目标间互相遮挡而影响的检测效果。实验结果证明,在MOT16数据集上,相比YOLOV3算法,提出的网络模型具有更好的检测效果,证明了方法的有效性。

关键词: 目标检测, YOLOV3算法, 斥力损失, 深度学习, 视频理解

Abstract:

Pedestrian detection in monitoring has complex backgrounds, multiple target scales and poses, and occlusion between people and surrounding objects. As a result, the YOLOV3 algorithm is inaccurate in detecting some targets, which may result in false detection, missed detection, or repeated detection. Therefore, on the basis of YOLOV3??s network, using the residual structure idea, the shallow and deep features are upsampled and fused to obtain 104×104 scale detection layers. And the size of the bounding box clustered by the K-means algorithm is applied to the network layer of each scale, which increases the sensitivity of the network to multi-scale and multi-pose targets and improves the detection effect. At the same time, the YOLOV3 loss function is updated using the repulsion loss of the prediction frame to other surrounding targets, so that the prediction frame is closer to the correct target, away from the wrong target. In addition, the false detection rate of the model is reduced, so as to improve the detection effect of mutual occlusion between the targets. The experimental results prove that the proposed network model has better detection effect than the YOLOV3 algorithm on the MOT16 dataset, which proves the effectiveness of the method.

Key words: target detection, YOLOV3 algorithm, repulsion loss, deep learning, video understanding