Journal of Frontiers of Computer Science and Technology ›› 2020, Vol. 14 ›› Issue (12): 2108-2121.DOI: 10.3778/j.issn.1673-9418.1911014

Previous Articles     Next Articles

Bilateral Video Object Segmentation Using Dynamic Appearance Modeling and Higher-Order Potential

TIAN Ying, GUI Yan, XIONG Daming   

  1. 1. School of Computer & Communication Engineering, Changsha University of Science & Technology, Changsha 410114, China
    2. Hunan Provincial Key Laboratory of Intelligent Processing of Big Data on Transportation, Changsha University of Science & Technology, Changsha 410114, China
  • Online:2020-12-01 Published:2020-12-11

动态外观模型和高阶能量的双边视频目标分割方法

田颖桂彦熊达铭   

  1. 1. 长沙理工大学 计算机与通信工程学院,长沙 410114
    2. 长沙理工大学 综合交通运输大数据智能处理湖南省重点实验室,长沙 410114

Abstract:

Aiming at the problems of poor quality and low time efficiency of video object segmentation in complex scenes, this paper proposes a novel bilateral video object segmentation using dynamic appearance modeling and higher-order potential, which is formulated the video object segmentation problem as binary labeling of Markov random field (MRF) based on bilateral grid cells. Firstly, this paper resamples each pixel in the video sequence with labeled keyframes into a higher-dimensional bilateral grid, which greatly reduces the video data to be processed. Secondly, this paper constructs a graph-cut optimization model using non-empty grid cells as the nodes of graph. The key is to construct the dynamic appearance model with confidence measurements, and to introduce a robust higher-order potential into the energy function. Finally, this paper uses the max-flow/min-cut algorithm to solve the global optimization problem, and binary label assignment of each pixel is achieved to obtain the high-quality video object segmentation. The experimental results on DAVIS 2016 and SegTrack v2 datasets show that with less user interaction, this method can not only obtain high-quality video object segmentation results for videos with complex scenes, but also significantly improve the time efficiency of video object segmentation.

Key words: video object segmentation, bilateral space, bilateral grid, confidence-based dynamic appearance model, higher-order energy potential

摘要:

针对复杂场景下视频目标分割质量不佳和时间效率低下的问题,提出了一种动态外观模型和高阶能量的双边视频目标分割方法,将视频目标分割转换为基于双边网格单元的马尔可夫随机场(MRF)模型求解问题。首先将带关键帧标记的视频序列映射至高维的双边网格,极大地减少待处理的数据。然后以非空网格单元作为图的结点并构建图割优化模型,其关键在于定义了具有置信度判别的动态外观模型,并在能量函数中引入鲁棒的高阶能量项。最后利用最大流/最小割算法进行全局优化求解,为视频像素点分配二值标签,最终获得高质量的视频目标分割结果。采用DAVIS 2016和SegTrack v2数据的实验结果表明,该方法在提供少量用户交互的情况下,不仅能在处理具有复杂场景的视频时获得理想的视频目标分割结果,而且还能显著提高视频目标分割的时间效率。

关键词: 视频目标分割, 双边空间, 双边网格, 置信动态外观模型, 高阶能量项