计算机科学与探索 ›› 2009, Vol. 3 ›› Issue (6): 656-664.DOI: 10.3778/j.issn.1673-9418.2009.06.011

• 学术研究 • 上一篇    下一篇

从不确定数据集中挖掘频繁Co-location模式

陆 叶,王丽珍+,张晓峰   

  1. 云南大学 信息学院 计算机科学与工程系,昆明 650091
  • 收稿日期:1900-01-01 修回日期:1900-01-01 出版日期:2009-11-15 发布日期:2009-11-15
  • 通讯作者: 陆 叶

Mining Frequent Co-location Patterns from Uncertain Data

LU Ye, WANG Lizhen+, ZHANG Xiaofeng   

  1. Department of Computer Science and Engineering, School of Information Science and Engineering, Yunnan University, Kunming 650091, China
  • Received:1900-01-01 Revised:1900-01-01 Online:2009-11-15 Published:2009-11-15
  • Contact: LU Ye

摘要: 把挖掘频繁co-location模式的经典算法Join-based算法扩展到了UJoin-based算法,解决了从不确定数据集中挖掘频繁co-location模式的问题。针对UJoin-based算法中ED(expected distances)计算开销大的问题,介绍了两种剪枝技术:边界矩形剪枝技术和三角不等式剪枝技术,其中,在三角不等式剪枝部分,分别讨论了取1个锚点、5个锚点和9个锚点的不同情况。通过大量实验证明了剪枝策略有效避免了大量的ED计算,提高了算法的效率。

关键词: 不确定数据, co-location模式, UJoin-based算法, 边界矩形剪枝, 三角不等式剪枝

Abstract: Studied the problem of mining frequent co-location patterns from uncertain data whose locations are described by probability density functions (PDF). It is showed that the UJoin-based algorithm, which generalizes the Join-based algorithm to handle uncertain instances,is very inefficient. The inefficiency comes from the fact that UJoin-based computes expected distances (ED) between instances. For arbitrary PDF’s, expected distances are computed by numerical integrations, which are costly operations. Various pruning methods are studied to avoid such expensive expected distance calculation. Experiments have been conducted to evaluate the effectiveness of this pruning techniques.

Key words: uncertain data, co-location patterns, UJoin-based algorithm, BR pruning, triangle inequality pruning

中图分类号: