Using Ordered Mutual Information to Match Schema with Opaque Column Names and Data Values

doi:10.3778/j.issn.1673-9418.1609004

Journal of Frontiers of Computer Science and Technology ›› 2017, Vol. 11 ›› Issue (9): 1389-1397.DOI: 10.3778/j.issn.1673-9418.1609004

Previous Articles Next Articles

Using Ordered Mutual Information to Match Schema with Opaque Column Names and Data Values

GUO Lele+, LIN Youfang, HAN Sheng

Beijing Key Laboratory of Traffic Data Analysis and Mining, School of Computer and Information Technology, Beijing Jiaotong University, Beijing 100044, China

Online:2017-09-01 Published:2017-09-06

利用有序互信息匹配包含非透明列的数据模式

郭乐乐+，林友芳，韩升

北京交通大学计算机与信息技术学院交通数据分析与挖掘北京市重点实验室，北京 100044

Abstract

Abstract: As a key issue of data integration, schema matching is the core task in data merging process of heterogeneous data sources. At present, a mass of schema matching methods have been proposed. However, most of them are lack of universality since they depend on the description information of schema heavily. Therefore, it is difficult to apply these approaches to other scenarios. To solve the problem, this paper proposes a novel schema matching method which uses ordered mutual information and does not rely on any description information of schema, such as column name, column type and foreign constraints, which make it own a strong universality. Furthermore, extensive experiments on various datasets indicate that the proposed technique outperforms earlier schema matching methods in terms of efficiency and accuracy.

Key words: schema matching, opaque conditions, mutual information, undirected graph matching

摘要： 数据模式匹配是异构数据源数据合并过程中的核心环节，属于数据集成中的关键问题。目前已有许多数据模式匹配方法，但其中很大一部分方法由于过多依赖数据模式描述信息，导致通用性不足，很难应用于其他场景中。为此，提出了一种利用有序互信息的匹配包含非透明列名和列数据值的数据模式。该方法不依赖诸如列名、列类型、主外键依赖等数据模式描述信息，因此具有很强的通用性。在多个数据集上实验结果表明，该方法能够在大幅降低匹配花费时间的同时提高匹配结果的准确率。

关键词: 数据模式匹配, 非透明条件, 互信息, 无向图匹配

GUO Lele, LIN Youfang, HAN Sheng. Using Ordered Mutual Information to Match Schema with Opaque Column Names and Data Values[J]. Journal of Frontiers of Computer Science and Technology, 2017, 11(9): 1389-1397.

郭乐乐，林友芳，韩升. 利用有序互信息匹配包含非透明列的数据模式[J]. 计算机科学与探索, 2017, 11(9): 1389-1397.

[1]	WANG Jinjie, LI Wei. Multi-Objective Feature Selection Method Based on Hybrid MI and PSO Algorithm [J]. Journal of Frontiers of Computer Science and Technology, 2020, 14(1): 83-95.
[2]	RONG Chuitian, LI Yinyin, WANG Yan. Research on Technologies of Chinese Key-Phrase Automatic Extraction [J]. Journal of Frontiers of Computer Science and Technology, 2019, 13(9): 1481-1492.
[3]	MA Chen, JIANG Gaoxia, WANG Wenjian. Dynamic Mutual Information Feature Selection for Functional Data [J]. Journal of Frontiers of Computer Science and Technology, 2019, 13(1): 158-168.
[4]	XIA Wei, WANG Shanlei, YIN Zidu, YUE Kun. Mutual Information Based Modeling and Completion of Correlations in Knowledge Graphs [J]. Journal of Frontiers of Computer Science and Technology, 2018, 12(7): 1064-1074.
[5]	FAN Hongjie, LIU Junfei, ZHOU Ludong, MA Zhiyi. XML Schema Matching Based on Multi-Strategy Similarity Integration [J]. Journal of Frontiers of Computer Science and Technology, 2016, 10(1): 14-24.
[6]	HUANG Dongmei, XU Kun, ZHANG Minghua. Entropy-Beta: A Strategy for Publishing Questions in Schema Matching via Crowdsourcing [J]. Journal of Frontiers of Computer Science and Technology, 2015, 9(7): 887-896.
[7]	ZHANG Wei, MIAO Duoqian, LI Feng. Application of WilsonTh Data Editing for Neighborhood Rough Sets Based Co-training Classification Model [J]. Journal of Frontiers of Computer Science and Technology, 2014, 8(9): 1092-1100.
[8]	ZHAO Chenlu, SHEN Derong, KOU Yue, NIE Tiezheng, YU Ge. Data-Oriented Method of Schema Matching Utilizing Information Theory [J]. Journal of Frontiers of Computer Science and Technology, 2013, 7(9): 819-830.
[9]	YANG Aimin, LIN Jianghao, ZHOU Yongmei. Method on Building Chinese Text Sentiment Lexicon [J]. Journal of Frontiers of Computer Science and Technology, 2013, 7(11): 1033-1039.
[10]	WANG Xin, WANG Xizhao, CHEN Jiankai, ZHAI Junhai. Comparative Study on Ordinal Decision Trees [J]. Journal of Frontiers of Computer Science and Technology, 2013, 7(11): 1018-1025.
[11]	ZHOU Wei, WANG Feng, WANG Chongjun, XIE Junyuan. Mining Core Herbs and Their Combination Rules Using Effect Degree [J]. Journal of Frontiers of Computer Science and Technology, 2013, 7(11): 994-1001.
[12]	WU Hao¹, LI Shijin¹⁺, LIN Lin², WAN Dingsheng¹. Multiple-strategy Combination Based Approach to Band Selection for Hyper-spectral Image Classification* [J]. Journal of Frontiers of Computer Science and Technology, 2010, 4(5): 464-472.
[13]	JIANG Fangjiao^1,2+, MENG Xiaofeng¹. Survey of Query Processing in Deep Web Data Integration [J]. Journal of Frontiers of Computer Science and Technology, 2009, 3(2): 113-129.
[14]	NIE Tiezheng, YU Ge+, SHEN Derong, KOU Yue. An instance-based result schema matching technique for Deep Web resources [J]. Journal of Frontiers of Computer Science and Technology, 2008, 2(6): 601-613.

Using Ordered Mutual Information to Match Schema with Opaque Column Names and Data Values

利用有序互信息匹配包含非透明列的数据模式

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 14

Recommended Articles

Metrics