计算机科学与探索 ›› 2021, Vol. 15 ›› Issue (3): 486-494.DOI: 10.3778/j.issn.1673-9418.1912044

• 人工智能 • 上一篇    下一篇

时序行为提名的上下文信息融合方法

王新文,谢林柏,彭力   

  1. 物联网技术应用教育部工程研究中心(江南大学 物联网工程学院),江苏 无锡 214122
  • 出版日期:2021-03-01 发布日期:2021-03-05

Context Information Fusion Method for Temporal Action Proposals

WANG Xinwen, XIE Linbo, PENG Li   

  1. Engineering Research Center of Internet of Things Technology Applications (School of Internet of Things Engineering, Jiangnan University), Ministry of Education, Wuxi, Jiangsu 214122, China
  • Online:2021-03-01 Published:2021-03-05

摘要:

在针对视频的人体活动定位和识别领域中,现有的时序行为提名方法无法很好地解决行为特征长期依赖性而导致提名召回率较低。针对此问题,提出了一种上下文信息融合的时序行为提名方法。该方法首先采用三维卷积网络提取视频单元的时空特征,然后采用双向门控循环网络构建上下文关系预测出时序行为区间。针对门控循环单元(GRU)存在参数较多和梯度消失的问题,通过输入特征控制门结构增强并行计算能力,通过引入加权平均增强历史和当前时刻信息融合能力,提出了一个简化的门控循环单元(S-GRU)。最后在数据集Thumos14上进行实验验证和比较,结果表明基于双向S-GRU循环网络的时序行为提名方法提高了提名召回率。

关键词: 门控循环网络(GRU), 梯度消失, 上下文信息, 时序行为提名, 时序行为检测

Abstract:

In the field of human activity localization and recognition in videos, the existing temporal action proposal methods have not solved the long-term dependence problem better, which results in lower recall rates of proposals. In view of this problem, a method based on context information fusion for temporal action proposals is proposed in this paper. Firstly, the spatiotemporal features of video units are extracted by the 3D convolutional network. Then, the bidirectional recurrent network is used to construct the context relationship for predicting the temporal action proposals. Considering the problems of more parameters and the vanishing gradient in the gated recurrent unit (GRU), a simplified-GRU (S-GRU) is proposed, in which the input features control the gating structure to enhance the parallel computing capability and the weighted average is introduced to enhance the ability of the gated recurrent unit to adaptively fuse the history and current time information. Finally, experimental results on the Thumos14 dataset demonstrate that the method based on the bidirectional S-GRU for temporal action proposals improves the recall rate of proposals.

Key words: gated recurrent network (GRU), vanishing gradient, context information, temporal action proposals, temporal action detection