计算机科学与探索 ›› 2012, Vol. 6 ›› Issue (4): 301-308.DOI: 10.3778/j.issn.1673-9418.2012.04.002

• 学术研究 • 上一篇    下一篇

基于MapReduce的对象共指消解方法

谢俊凯,胡 伟,柏文阳   

  1. 1. 南京大学 计算机科学与技术系,南京 210093
    2. 南京大学 计算机软件新技术国家重点实验室,南京 210093
  • 出版日期:2012-04-01

Resolving Object Coreference in Semantic Web Based on MapReduce

XIE Junkai, HU Wei, BAI Wenyang   

  1. 1. Department of Computer Science and Technology, Nanjing University, Nanjing 210093, China
    2. State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing 210093, China
  • Online:2012-04-01

摘要: 对象共指消解是语义Web研究中的一个关键问题。虽然目前已有许多不同的对象共指消解方法,但是它们的效率还不能满足实际使用的要求。MapReduce框架具有简单性和较强的计算能力,已被广泛用于各种数据并行处理任务。基于MapReduce的两个不同阶段,分别提出了两种并行算法来消解对象共指。具体地,给定一个初始训练集合和一个阈值,算法能够高效地发现一组具有可判别度的属性,并且满足它们的确信度高于预先给定的阈值。这些具有高可判别度的属性将被用于识别拥有相似取值的对象共指。基于真实数据集,通过人为增大数据集规模,验证了基于MapReduce算法的有效性。

关键词: 对象共指消解, MapReduce, 语义Web

Abstract: Object coreference resolution is a hotspot topic in the research area of the semantic Web. Although there are many novel approaches for resolving object coreference, their efficiency is still far away from practical use. At present, the MapReduce framework is widely used for processing data because of its simplicity and higher computational capability with lower cost. This paper proposes two parallel algorithms in terms of different phases of Map-
Reduce to resolve object coreference in the semantic Web. Specifically, given a labeled training set and a threshold, the algorithms can efficiently find the most discriminative properties whose confidences are no less than the predefined threshold. These discriminative properties are used to identify object coreference holding similar values. The paper reports the results from experiments on real datasets, by increasing the dataset size, to evaluate the speed-up and scale-up properties of the proposed algorithms using MapReduce.

Key words: object coreference resolution, MapReduce, semantic Web