计算机科学与探索 ›› 2016, Vol. 10 ›› Issue (1): 14-24.DOI: 10.3778/j.issn.1673-9418.1507071

• 数据库技术 • 上一篇    下一篇

多策略相似度整合的XML模式匹配方法

范红杰1,柳军飞2+,周鲁东1,麻志毅1   

  1. 1. 北京大学 信息科学技术学院,北京 100871
    2. 北京大学 软件工程国家工程研究中心,北京 100871
  • 出版日期:2016-01-01 发布日期:2016-01-07

XML Schema Matching Based on Multi-Strategy Similarity Integration

FAN Hongjie1, LIU Junfei2+, ZHOU Ludong1, MA Zhiyi1   

  1. 1. School of Electronics Engineering and Computer Science, Peking University, Beijing 100871, China
    2. National Engineering Research Center for Software Engineering, Peking University, Beijing 100871, China
  • Online:2016-01-01 Published:2016-01-07

摘要: 模式匹配用于发现不同数据源中概念之间的语义对应关系,已成为数据集成、数据交换等领域的研究热点。研究者提出了大量的基于XML模式匹配方法,从而可以识别XML中数据的语义对应关系。XML模式匹配存在着一些挑战,例如如何将节点和结构匹配进行综合考虑,如何有效拟合多种相似度等。面对如上问题,针对XML节点和结构两方面进行相似度计算,得到相似度矩阵后整合这两个方面的相似度。随后通过多种策略组合和优化算法进行拟合,以得到优化的匹配结果。最后,通过基准测试平台对比,该方法相比于经典的模式匹配方法具有较高的精确率和召回率。

关键词: 数据交换, 模式匹配, 可扩展标记语言(XML), 相似度度量, 多策略组合

Abstract: Schema matching, as finding the semantic correspondence of concepts between different data sources, becomes the hot topic of data integration, data exchange and other areas. Researchers have proposed a number of matching methods, which make it possible to identify and discover the semantic correspondence between the XML data. But XML schema matching has some challenges, such as how to consider the variety of similarity, and how to integrate the similarity so as to make the optimum matching results. In order to improve the quality of XML matching, firstly this paper calculates the similarity of XML nodes and structure from the different levels, gets the similarity matrix, and integrates the similarity measure between these two aspects effectively. Then, this paper fits through a variety of strategies combination and optimization algorithms to make the final matching result achieve global optimum after the effective integration of these two aspects of similarity measure. Finally, compared with the classic schema matching tool, this method has a higher precision and recall rate through the benchmark platform.

Key words: data exchange, schema matching, extensible markup language(XML), similarity measure, multi-strategy integration