Journal of Frontiers of Computer Science and Technology ›› 2007, Vol. 1 ›› Issue (2): 206-215.

• 学术研究 • Previous Articles     Next Articles

An ontology-theme-based method of acquiring knowledge from Chinese natural language documents

CHE Haiyan1+,SUN Jigui1,2,JING Tao1,BAI Xi1   

  1. 1.College of Computer Science and Technology, Jilin University, Changchun 130012, China
    2.Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun
    130012, China
  • Received:1900-01-01 Revised:1900-01-01 Online:2007-08-20 Published:2007-08-20
  • Contact: CHE Haiyan

一个基于本体主题的中文知识获取方法

车海燕1+,孙吉贵1,2,荆 涛1,白 曦1

  

  1. 1.吉林大学 计算机科学与技术学院,长春 130012
    2.吉林大学 教育部符号计算与知识工程重点实验室,长春 130012
  • 通讯作者: 车海燕

Abstract: Acquiring knowledge from Chinese natural language documents is very difficult due to the particular characteristic of Chinese. Although many researchers have made great progress on the Chinese named entity recognition(NER for short), it is hardly possible to extract correctly the binary relationships between a pair of recognized entities without the facilities of synonym tables, or some Chinese linguistic ontology like WordNet. Propose an ontology-theme-based method to extract these relationships from Chinese natural language documents. It is the first time to import the theme idea into domain ontology. Concepts and properties of the original domain ontology are partitioned according to the themes and the mapping relations between concepts and themes, themes and properties are established. For a sentence being processed, some entities, individuals and properties can be extracted firstly by simple NER and direct string-ontology matching. These correctly extracted information can then be used to infer the themes of this sentence. Further, the themes can provide useful clues to find more possible relationships. Results of elementary experiments indicate that this theme-based approach can obtain a higher recall rate and precision rate compared with other methods without the incorporation of theme.

Key words: knowledge acquisition, ontology, theme, Chinese

摘要: 中文语言自身的特点决定了从中文自然语言文档中获取知识是非常困难的。尽管目前对中文的命名实体识别(简称为NER)已经取得了较好的效果,但是如果不借助同义词表或者类似WordNet的中文语言知识库,几乎无法正确地抽取已经识别出的实体之间的关系。文章提出了一个基于本体主题的思想进行中文知识获取的方法,该方法首次将主题思想引入领域本体,由领域专家对原始的领域本体中的概念和属性按照主题进行划分,建立起概念到主题、主题到属性的关联关系。在对一句话进行知识抽取时,通过简单的NER和直接与本体映射的方法可以识别出一句话中的部分概念、个体和属性,利用这些准确识别出的信息可以判定该句话所属的主题;该主题则进一步提供了寻找关系的线索。初步的实验结果表明与没有利用主题信息的方法相比,该方法可以取得更好的召回率和准确率。

关键词: 知识获取, 本体, 主题, 中文