计算机科学与探索 ›› 2010, Vol. 4 ›› Issue (11): 1039-1048.DOI: 10.3778/j.issn.1673-9418.2010.11.009

• 学术研究 • 上一篇    下一篇

面向PSTP查询的高效处理算法*

周军锋+;李义国;郭景峰

  

  1. 燕山大学 信息科学与工程学院, 河北 秦皇岛 066004
  • 收稿日期:1900-01-01 修回日期:1900-01-01 出版日期:2010-11-01 发布日期:2010-11-01
  • 通讯作者: 周军锋

PSTP Query Oriented Efficient Algorithm*

ZHOU Junfeng+;LI Yiguo;GUO Jingfeng

  

  1. School of Information Science and Engineering, Yanshan University, Qinhuangdao, Hebei 066004, China
  • Received:1900-01-01 Revised:1900-01-01 Online:2010-11-01 Published:2010-11-01
  • Contact: ZHOU Junfeng

摘要: 在使用“不完全结构的约束查询(PSTP查询)”从XML文档中获取信息时, 用户可以根据自身对XML文档结构的熟悉程度, 在查询表达式中灵活地嵌入结构约束条件, 从而满足完全不了解、完全了解及了解部分结构信息的各种用户的查询需求。提出一种基于扩展Dewey编码的查询处理算法, 可以在仅扫描一遍元素的情况下, 处理任意形式的PSTP查询。不同数据集上的实验结果表明, EDPS算法在处理twig查询、不包含“*”结点的PSTP查询及包含“*”结点的PSTP查询时, 综合性能明显优于已有方法。

关键词: 可扩展标示语言, PSTP查询, 扩展Dewey

Abstract: When extracting desired information from XML data using partially specified twig pattern (PSTP) queries, users can flexibly use any structural constraints to specify their query semantics, therefore different kinds of users, i.e., users who fully understand or know nothing about or partially understand the underlying structure, can search desired information based on their familiarity with the structure of given XML data. This paper proposes an efficient algorithm, EDPS, based on extended Dewey labeling scheme to process a general PSTP query efficiently by just scanning the input elements only once. The experimental results on various datasets indicate that this method performs significantly better than existing ones when processing twig queries, PSTP queries without “*” nodes and PSTP queries containing “*” nodes.

Key words: Extensible Markup language (XML), partially specified twig pattern (PSTP) queries, extended Dewey

中图分类号: