计算机科学与探索 ›› 2020, Vol. 14 ›› Issue (10): 1754-1761.DOI: 10.3778/j.issn.1673-9418.1909044

• 图形图像 • 上一篇    下一篇

联合场景和行为特征的短视频行为识别

董旭,谭励,周丽娜,宋艳艳   

  1. 北京工商大学 计算机与信息工程学院,北京 100048
  • 出版日期:2020-10-01 发布日期:2020-10-12

Short Video Behavior Recognition Combining Scene and Behavior Features

DONG Xu, TAN Li, ZHOU Lina, SONG Yanyan   

  1. School of Computer and Information Engineering, Beijing Technology and Business University, Beijing 100048, China
  • Online:2020-10-01 Published:2020-10-12

摘要:

目前行为识别方法更关注动作本身,但短视频中包含的信息比较少,需要利用视频中的多种特征信息,提高任务行为识别的准确率。因此,对基于场景和行为联合特征的短视频行为识别方法进行了研究,利用场景信息作为上下文信息,提高传统单一行为识别网络的效果。首先对短视频中的场景特征利用深度融合网络进行提取;然后对短视频中的行为特征利用可变卷积网络进行RGB特征和Flow特征提取;最后利用字典学习的方法对构建的联合特征进行稀疏表示,提取出更具解释性的特征信息。在Charades测试集top-5准确率为33%,优于传统单一行为识别网络,使行为识别效果更加准确。

关键词: 场景识别, 行为识别, 字典学习, 深度学习, 视频理解

Abstract:

The behavior recognition method pays more attention to the action itself, but the short video contains less information. And it is necessary to utilize various feature information in the video as much as possible to improve the accuracy of behavioral recognition. Therefore, the short video behavior recognition method based on scene and behavior joint features is studied, and the scene information is used as context information to improve the effect of traditional single behavior recognition network. First, the scene features in the short video are extracted using a deep fusion network. Then, the behavioral features in the short video utilize the variable convolutional network for RGB features and flow features extraction. Finally, the dictionary learning method is used to sparsely represent the joint features, and more explanatory feature information is extracted for short video behavior recognition. The top-5 accuracy rate in the Charades test set is 33%. It is superior to the traditional single behavior recognition network, making the behavior recognition effect more accurate.

Key words: scene recognition, action recognition, dictionary learning, deep learning, video understanding