Journal of Frontiers of Computer Science and Technology ›› 2014, Vol. 8 ›› Issue (12): 1485-1493.DOI: 10.3778/j.issn.1673-9418.1409009

Previous Articles     Next Articles

Markov Network Information Retrieval Expanded Model Based on Hierarchical Dependence

GAN Lixin1,2+, WAN Changxuan1, WANG Mingwen3   

  1. 1. School of Information Technology, Jiangxi University of Finance and Economics, Nanchang 330013, China
    2. School of Math and Computer Science, Jiangxi Science and Technology Normal University, Nanchang 330038, China
    3. School of Computer Information Engineering, Jiangxi Normal University, Nanchang 330022, China
  • Online:2014-12-01 Published:2014-12-08

基于层次依赖的Markov网络信息检索扩展模型

甘丽新1,2+,万常选1,王明文3   

  1. 1. 江西财经大学 信息管理学院,南昌 330013
    2. 江西科技师范大学 数学与计算机科学学院,南昌 330038
    3. 江西师范大学 计算机信息工程学院,南昌 330022

Abstract: Query expansion is one of key technologies to solve the low efficiency problem which is caused by the term mismatch between user query and relevant documents. This paper proposes a Markov network information retrieval expanded model based on hierarchical dependence. This model considers these factors comprehensively such as hierarchy distance between candidates and query terms, relevance between terms, the out degree of a term and path selection. This model also helps to mine more potential candidates by term reweighting with hierarchical dependence and to select candidates with more relevant to query for information retrieval expanded model. The experimental results on five standard collections demonstrate that the Markov network information retrieval expanded model based on hierarchical dependence outperforms BM25 model without query expansion by 5%-41% and 5%-70% in 3-avg and 11-avg respectively. Compared with the Markov network information retrieval expanded model based on direct correlation, the proposed model performs better overall on retrieval efficiency.

Key words: hierarchical dependence, Markov network, query expansion, information retrieval

摘要: 查询扩展是解决查询词与相关文档中的词不匹配而导致检索效率低下问题的关键技术之一。提出了基于层次依赖的Markov网络信息检索扩展模型。该模型综合考虑了候选词与查询词的层次距离、词间相关性、词节点的出度和路径等因素,通过层次依赖关系对候选词进行重新加权,选择与查询最为相关的候选词应用于信息检索扩展模型,有利于挖掘出更多潜在的、深层次依赖关系的查询候选词。在5个标准数据集上进行了实验,结果表明基于层次依赖的Markov网络信息检索扩展模型与未进行查询扩展的BM25模型相比,在3-avg和11-avg上分别提高了5%~41%和5%~70%不等,与基于直接相关的Markov网络信息检索扩展模型相比,该模型在总体检索效率上表现更优。

关键词: 层次依赖, Markov网络, 查询扩展, 信息检索