计算机科学与探索 ›› 2012, Vol. 6 ›› Issue (10): 935-947.DOI: 10.3778/j.issn.1673-9418.2012.10.009

• 学术研究 • 上一篇    下一篇

XML关键词搜索结果的多样化

刘喜平1,2+,万常选1,2,刘德喜1,2   

  1. 1. 江西财经大学 信息管理学院,南昌 330013
    2. 江西省高校数据与知识工程重点实验室,南昌 330013
  • 出版日期:2012-10-01 发布日期:2012-09-28

Results Diversification for Keyword Search on XML Documents

LIU Xiping1,2+, WAN Changxuan1,2, LIU Dexi1,2   

  1. 1. School of Information Technology, Jiangxi University of Finance and Economics, Nanchang 330013, China
    2. Jiangxi College Key Laboratory of Data and Knowledge Engineering, Nanchang 330013, China
  • Online:2012-10-01 Published:2012-09-28

摘要: 可扩展标记语言(extensible markup language,XML)数据的关键词搜索面临着搜索结果数量庞大,同质化严重和不易区分等问题,针对这些问题,提出了一种新的基于多样化的方法。首先从查询结果抽取原型以标识查询结果语义,然后根据结果原型的特点,定义了原型的兴趣度和原型之间的距离,在此基础上,实现了原型的多样化。进一步提出了一种XML关键词搜索结果组织方法,即按照原型聚集查询结果。这种组织方式能够解决上述问题。最后通过实验证明了所提方法的有效性。

关键词: 可扩展标记语言(XML), 关键词搜索, 多样化

Abstract: Results of keyword search on extensible markup language (XML) documents are confronted with the problems of high volume, being homogenous in semantics and difficulty in differentiation. To solve these problems, this paper proposes a novel diversification-based method. It first defines the prototype of a search result to express the semantics of the result. Based on the characteristics of result prototype, it defines the interestingness of a prototype and the distance between prototypes. It then diversifies prototypes using these measures. The paper goes further to propose a new method to organize the search results of an XML keyword query, i.e., clustering the search results based on the diversified prototypes. The method can solve the above-mentioned problems. Experimental results verify that the methods are effective.

Key words: extensible markup language (XML), keyword search, diversification