计算机科学与探索 ›› 2012, Vol. 6 ›› Issue (8): 684-697.DOI: 10.3778/j.issn.1673-9418.2012.08.002

• 学术研究 • 上一篇    下一篇

频繁模式溯源

王  斌+,刘向宇,杨晓春,王国仁   

  1. 东北大学 信息科学与工程学院,沈阳 110819
  • 出版日期:2012-08-01 发布日期:2012-08-06

Tracking Provenance of Frequent Patterns

WANG Bin+, LIU Xiangyu, YANG Xiaochun, WANG Guoren   

  1. School of Information Science and Engineering, Northeastern University, Shenyang 110819, China
  • Online:2012-08-01 Published:2012-08-06

摘要: 数据起源是关于数据来源、转换和更新过程的研究。基于频繁模式挖掘的性质和特点,提出了FP+树来记录频繁模式来源。给出了频繁模式溯源的相关理论和证明,根据不同追溯机制提出了三种频繁模式溯源方法,并对方法的正确性和执行代价给出了理论证明和推导。在进行频繁模式挖掘时,在不增加额外负担的情况下实现了频繁模式溯源。针对条件FP+树结构特点和频繁模式性质,提出了采用α-剪枝求解条件FP+树的投影操作,加快了频繁模式挖掘和数据溯源的执行效率。实验结果显示,采用基于FP+树的频繁模式溯源方法,可以高效地实现频繁模式溯源,并且条件FP+树的α-剪枝策略的有效性得到验证。

null

关键词: 数据起源, 世系追踪, 频繁模式, 剪枝, 数据挖掘

Abstract: Data provenance studies the processes of data generation, transformation and update. Based on properties and features of frequent patterns (FP), this paper proposes an FP+ tree to store sources of frequent patterns. Then, it gives the related theories and proofs for tracking provenances of frequent patterns, proposes three tracking methods for frequent patterns based on different lineage tracing mechanism, and deduces the correctness and theoretical cost of each method. While mining frequent patterns, the lineages of frequent patterns can be traced without extra cost. According to the structure of conditional FP+ tree and properties of frequent patterns, the paper also proposes α-pruning for efficiently projecting conditional FP+ tree and improving the efficiency of lineage tracing. Experimental results show that the FP+ tree based tracking provenance of frequent patterns can get sources of frequent patterns efficiently, and α-pruning for conditional FP+ tree is efficient.

Key words: data provenance, lineage tracing, frequent patterns, pruning, data mining