Journal of Frontiers of Computer Science and Technology ›› 2014, Vol. 8 ›› Issue (11): 1345-1357.DOI: 10.3778/j.issn.1673-9418.1407057

Previous Articles     Next Articles

Strategy of Extracting Chinese Entities Relation Based on Predicate Concept Connectivity

XIA Jiali1, CHENG Chunlei1,2+, CHEN Hui3, CAO Zhonghua1,3, LI Guangquan1   

  1. 1. School of Information Management, Jiangxi University of Finance and Economics, Nanchang 330032, China
    2. School of Computer Science, Jiangxi University of Traditional Chinese Medicine, Nanchang 330004, China
    3. School of Software & Communication Engineering, Jiangxi University of Finance and Economics, Nanchang 330032, China
  • Online:2014-11-01 Published:2014-11-04

谓词概念连通度的中文实体关系抽取策略

夏家莉1,程春雷1,2+,陈  辉3,曹重华1,3,李光泉1   

  1. 1. 江西财经大学 信息管理学院,南昌 330032
    2. 江西中医药大学 计算机学院,南昌 330004
    3. 江西财经大学 软件与通信工程学院,南昌 330032

Abstract: Chinese entities relation extraction task is a research focus of text retrieval and knowledge discovery in the open corpus. In the traditional extraction strategies, there exist some problems such as heavy workload of manual annotating, poor pattern versatility and relatively fixed relational granularity, etc. All these restrict the extraction effect in open corpus especially. This paper builds the predicate concept model (PCM) relying on hierarchical structure and relational connectivity of concept, proposes the predicate concept acquisition strategy for incremental concept learning (PCIA), achieves the extraction strategy based on predicate concept connectivity (PCCS), and carries out the untight, long-distant relation extraction ultimately. The construction of the formal concepts is relatively independent, and the combination of concept granularities is more flexible. Therefore, the description approach of the relationship has a better versatility and interpretability, and provides an effective means for unknown relationship identifying and extracting in the open corpus. The experimental results show that PCCS improves the effect of entities identification and entities connectivity path choice, and obtains good entities relation extracting performance.

Key words: entities relationship, predicate concept model (PCM), concept association degree, concept connectivity

摘要: 中文实体关系抽取是开放域文本检索与知识发现的研究热点,传统的抽取策略普遍存在人工标注量大,模式通用性受限,关系抽取粒度相对固定等问题,限制了其在开放领域的关系抽取效果。基于概念的结构分层和关系连通,面向中文实体关系构建了谓词概念模型(predicate concept model,PCM),在此基础上,提出了增量学习的谓词概念获取策略PCIA和基于谓词概念连通的关系抽取策略PCCS,由此进行了开放域非紧密的、远距离实体关系的抽取。各谓词概念的构建相对独立,概念组合更为灵活,对关系的描述具有更好的通用性和可解释性,为开放域未知关系的识别与抽取提供了有效手段。实验结果表明,PCCS有效提升了中文实体识别及实体连通路径选择的质量,获得了良好的关系抽取性能。

关键词: 实体关系, 谓词概念模型(PCM), 概念相关度, 概念连通度