Journal of Frontiers of Computer Science and Technology ›› 2014, Vol. 8 ›› Issue (3): 266-274.DOI: 10.3778/j.issn.1673-9418.1306048

Previous Articles     Next Articles

Results Diversification for Keyword Search Using Semantic Information of Entity

SONG Yuling, WANG Ning+   

  1. School of Computer and Information Technology, Beijing Jiaotong University, Beijing 100044, China
  • Online:2014-03-01 Published:2014-03-05

利用实体语义信息的关键字查询结果多样化

宋玉玲,王  宁+   

  1. 北京交通大学 计算机与信息技术学院,北京 100044

Abstract: In recent years, keyword search on extensible markup language (XML) data receives abroad attention and research. As an effective way to improve search efficiency for users, the diversification of search results has become a hot topic as well. Though search results are diversified with different granularity in the existing methods, the effects are unsatisfied. To solve this problem, this paper proposes a new method of diversifying search results by using the semantic information of central entities. Above all, it analyzes the semantic information which the entities contain and defines a formula to compute semantic similarity. Then, it clusters entities by measuring their semantic similarity among each other, and proposes the location rule of central entities. Based on the work above, search results are classified according to the multiple clusters which their central entities belong to. Users can be navigated through the results with semantic labels if results are classified by the semantics of central entities, so search efficiency is improved. The experimental results verify that the method is effective.

Key words: extensible markup language (XML), keyword search, entity, diversification

摘要: 近年来,可扩展标记语言(extensible markup language,XML)数据的关键字查询受到广泛关注和研究,查询结果的多样化作为提高用户查找效率的有效途径,也成为一个研究热点。已有的方法采用不同的粒度对查询结果进行多样化,但效果并不理想。为解决这个问题,提出了一种新的方法——从查询结果所描述的中心实体出发对其进行多样化。首先分析实体包含的语义信息,根据实体的特征定义实体语义相似性计算公式,然后通过衡量这些实体之间的语义相似性,对其进行语义划分,并给出查询结果所属中心实体的定位规则。基于以上工作,就可以将查询结果依据其所属中心实体的分组情况进行分类。这样得到的查询结果分组可以让用户根据每组的语义标签进行查询导航,有利于提高查找效率。实验结果证明了该方法的有效性。

关键词: 可扩展标记语言(XML), 关键字查询, 实体, 多样化