Journal of Frontiers of Computer Science and Technology ›› 2010, Vol. 4 ›› Issue (2): 124-133.DOI: 10.3778/j.issn.1673-9418.2010.02.004

• 学术研究 • Previous Articles     Next Articles

I/O and CPU Balanced XML Keyword Retrieval

LI Qiushi1,2+, WANG Qiuyue1,2, WANG Shan1,2   

  1. 1. Key Laboratory of Data Engineering and Knowledge Engineering, Ministry of Education, Renmin University of China, Beijing 100872, China
    2. School of Information, Renmin University of China, Beijing 100872, China
  • Received:1900-01-01 Revised:1900-01-01 Online:2010-02-15 Published:2010-02-15
  • Contact: LI Qiushi

平衡I/O和CPU的XML关键词检索

李求实1,2+,王秋月1,2,王 珊1,2   

  1. 1. 中国人民大学 数据工程与知识工程教育部重点实验室,北京 100872
    2. 中国人民大学 信息学院,北京 100872
  • 通讯作者: 李求实

Abstract: The widespread application of XML (extensive makeup language) makes it a new research focused on the area of information retrieval. Although the precision of XML information retrieval can be improved greatly because of the internal structure in XML documents, the finer retrieval granularity (i.e. elements or passages instead of documents) and more complex scoring and ranking models (e.g. the language model combined with the hierarchical inference network) convert traditional information retrieval applications which are I/O intensive into CPU-bound applications. In view of such a transformation, a new query processing framework of XML retrieval is proposed, which can dynamically balance I/O and CPU workloads to minimize average response time per query by creating two indexes for XML corpus and scheduling subtasks to use different indexes for queries evaluation according to the current state of the system.

Key words: extensive makeup language (XML), structural retrieval, I/O, language model

摘要: 随着XML在数据交换和数据存储中的普遍应用,基于XML文档的信息检索研究逐渐成为新的研究热点。XML文档本身含有的结构信息可以使其检索精度得到很大提高,但相应地,XML检索中使用的较复杂的评分模型(如组合语言模型和推理网络的结构化评分模型)和较细的返回结果粒度(由文档转变为元素或者段落),也使得传统的信息检索由I/O密集型应用转变为CPU密集型应用。针对上述应用特点的转变,提出了一种新的检索处理框架,即保存数据的两种索引形式,根据系统的状态动态地调整任务调度,平衡I/O和CPU的处理,以达到减少单个查询的平均响应时间的目的。

关键词: 可扩展标记语言, 结构化检索, 输入/输出, 语言模型

CLC Number: