计算机科学与探索 ›› 2021, Vol. 15 ›› Issue (10): 1958-1968.DOI: 10.3778/j.issn.1673-9418.2007007

• 人工智能 • 上一篇    下一篇

外部信息引导和残差置乱的场景图生成方法

田鑫,季怡,高海燕,林欣,刘纯平   

  1. 1. 苏州大学 计算机科学与技术学院,江苏 苏州 215006
    2. 符号计算与知识工程教育部重点实验室(吉林大学),长春 130012
  • 出版日期:2021-10-01 发布日期:2021-09-30

Scene Graph Generation Method Based on External Information Guidance and Residual Scrambling

TIAN Xin, JI Yi, GAO Haiyan, LIN Xin, LIU Chunping   

  1. 1. College of Computer Science and Technology, Soochow University, Suzhou, Jiangsu 215006, China
    2. Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun 130012, China
  • Online:2021-10-01 Published:2021-09-30

摘要:

场景图因其具有的表示视觉场景内容的语义和组织结构的特点,有助于视觉理解和可解释推理,成为计算机视觉研究热点之一。但由于现存的视觉场景中目标和目标之间关系标注的不平衡,导致现有的场景图生成方法受到数据集偏置影响。对场景图数据失衡问题进行研究,提出一种基于外部信息引导和残差置乱相结合的场景图生成方法(EGRES),缓解数据集偏置对场景图生成的负面影响。该方法利用外部知识库中无偏置的常识性知识规范场景图的语义空间,缓解数据集中关系数据分布不平衡的问题,以提高场景图生成的泛化能力;利用残差置乱方式对视觉特征和提取的常识性知识进行融合,规范场景图生成网络。在VG数据集上的对比实验和消融实验证明,提出的方法可以有效改善场景图生成。对于数据集中不同标签的对比实验证明,提出的方法可以改善绝大多数关系类别的生成性能,尤其是中低频关系类别下的场景图生成性能,极大地改善了数据标注失衡的问题,比现有的场景图生成方法具有更好的生成效果。

关键词: 数据集偏置, 残差置乱, 外部知识库, 场景图生成

Abstract:

Scene graphs have become one of the hotspots in computer vision research area due to their characteristics of representing the semantic and organizational structure of visual scene content, which facilitates visual comprehension and interpretable inference. However, due to the imbalance of the relationship annotation between objects in the visual scene, the existing scene graph generation methods are affected by the bias of the data set. The scene graph data imbalance problem is investigated, and a scene graph generation method based on the combination of external information guidance and residual scrambling (EGRES) is proposed to alleviate the negative impact of data set bias on scene graph generation. This method uses unbiased common sense knowledge in the external knowledge base to standardize the semantic space of the scene graph, alleviate the imbalance of the relational data distribution in the data set, and improve the generalization ability of scene graph generation. The residual scrambling method is used to fuse the visual features and the extracted common sense knowledge to standardize the scene graph generation network. The comparison experiments and ablation experiments on the VG data set prove that the proposed method in this paper can effectively improve the scene graph generation. The comparison experiments for different labels in the data set prove that the proposed method can improve the generation performance of most of the relationship categories, especially in the medium and low frequency relationship categories, which greatly alleviates the imbalance of data labeling and has better generation results than the existing scene graph generation methods.

Key words: data set bias, residual scrambling, external knowledge base, scene graph generation