计算机科学与探索 ›› 2022, Vol. 16 ›› Issue (2): 323-336.DOI: 10.3778/j.issn.1673-9418.2106004

• 综述·探索 • 上一篇    下一篇

深度学习的人-物体交互检测研究进展

阮晨钊, 张祥森, 刘科, 赵增顺+()   

  1. 山东科技大学 电子信息工程学院,山东 青岛 266590
  • 收稿日期:2021-06-01 修回日期:2021-08-06 出版日期:2022-02-01 发布日期:2021-08-19
  • 通讯作者: + E-mail: zhaozengshun@163.com
  • 作者简介:阮晨钊(1996—),男,山东淄博人,硕士研究生,主要研究方向为计算机视觉、图像处理。
    张祥森(1997—),男,山东邹城人,硕士研究生,主要研究方向为深度学习、图像处理。
    刘科(1998—),男,山东青岛人,硕士研究生,主要研究方向为深度学习、图像处理。
    赵增顺(1975—),男,山东滨州人,博士,副教授,主要研究方向为计算机视觉、智能机器人、机器学习。
  • 基金资助:
    中国博士后科学基金特别资助项目(2015T80717);山东省自然科学基金(ZR2020MF086)

Progress on Human-Object Interaction Detection of Deep Learning

RUAN Chenzhao, ZHANG Xiangsen, LIU Ke, ZHAO Zengshun+()   

  1. College of Electronic and Information Engineering, Shandong University of Science and Technology, Qingdao, Shandong 266590, China
  • Received:2021-06-01 Revised:2021-08-06 Online:2022-02-01 Published:2021-08-19
  • About author:RUAN Chenzhao, born in 1996, M.S. candidate. His research interests include computer vision and image processing.
    ZHANG Xiangsen, born in 1997, M.S. candi-date. His research interests include deep learning and image processing.
    LIU Ke, born in 1998, M.S. candidate. His research interests include deep learning and ima-ge processing.
    ZHAO Zengshun, born in 1975, Ph.D., associate professor. His research interests include computer vision, intelligent robots and machine learning.
  • Supported by:
    Postdoctoral Science Foundation Funded Project of China(2015T80717);Natural Science Foundation of Shandong Province(ZR2020MF086)

摘要:

人-物体交互检测(HOI),就是把图像作为输入,检测出图像中存在交互行为的人和物体以及他们之间的交互动词。它是计算机视觉范畴里继目标检测、图像分割和目标跟踪之后又一新任务,旨在对图像进行更深层的理解。针对目前基于深度学习的HOI检测综述性文章的空白,以HOI检测方法的发展历程为主线,对基于深度学习的HOI检测方法进行了分类与分析。首先简要总结了早期的技术方法,然后根据模型结构将现有算法分为两阶段方法和一阶段方法并对一些代表性算法进行分析介绍。将两阶段方法分为融入注意力、图模型以及姿势和身体部位三类进行重点论述,总结了每类方法的基本思想与优缺点。此外,还详细介绍了HOI检测任务的实验评价指标、基准数据集和大多数现有方法的实验结果,对不同类别的方法取得的结果进行说明。最后对该技术面临的主要挑战进行总结分析并对未来发展趋势进行展望。

关键词: 人-物体交互检测(HOI), 计算机视觉, 目标检测, 深度学习

Abstract:

The task of human-object interaction (HOI) detection takes the image as the input to detect the interaction between people and objects in the image and the interaction verbs between them. It is a new task besides target detection, image segmentation and target tracking in the field of computer vision, in order that the image can be understood deeply. Aiming at filling the gap in the current review article of HOI detection based on deep learning, the methods for HOI detection are classified and analyzed. Firstly, the early methods are summarized briefly, the two-stage methods and one-stage methods are investigated according to the structure of model, and some representative algorithms are analyzed and introduced. The two-stage methods are focused on, which are divided into 3 categories: attention-aware, graph model, posture and body parts. What’s more, the basic ideas, advantages and disadvantages of each type of method are summarized. Besides, the experimental evaluation metrics, the benchmark data sets of HOI detection and the experimental results of most existing methods are introduced in detail and the results obtained by different types of methods are described. Finally, the main challenges of this technology are summarized and the future direction of development is prospected.

Key words: human-object interaction (HOI) detection, computer vision, object detection, deep learning

中图分类号: