计算机科学与探索 ›› 2015, Vol. 9 ›› Issue (4): 429-437.DOI: 10.3778/j.issn.1673-9418.1407034

• 数据库技术 • 上一篇    下一篇

高效的稀有序列模式挖掘方法

雷  雨1+,李  曼2,胡卫松2,宋国杰1,谢昆青1   

  1. 1. 北京大学 机器感知与智能教育部重点实验室,北京 100871
    2. NEC中国研究院,北京 100084
  • 出版日期:2015-04-01 发布日期:2015-04-02

Efficient Methods for Rare Sequential Pattern Mining

LEI Yu1+, LI Man2, HU Weisong2, SONG Guojie1, XIE Kunqing1   

  1. 1. Key Laboratory of Machine Perception, Ministry of Education, Peking University, Beijing 100871, China
    2. NEC Labs China, Beijing 100084, China
  • Online:2015-04-01 Published:2015-04-02

摘要: 序列模式挖掘是数据挖掘领域的一个经典研究问题,目前的研究主要关注于频繁序列模式的挖掘。但是不频繁的序列模式,即“稀有序列模式(rare sequential pattern,RSP)”也可能蕴含着一些不寻常的规律,具有更高的挖掘价值。因此,给出了稀有序列模式挖掘的定义,并且提出了两种逐层挖掘稀有序列模式完全集的方法。为克服挖掘稀有序列模式全集时产生的组合爆炸问题,提出了一种高效的基于二分查找的算法来挖掘“最小稀有序列模式(minimal rare sequential pattern,MRSP)”全集,它包含了稀有序列模式全集的完整信息。通过实验验证了提出的算法可以有效地挖掘稀有序列模式。

关键词: 稀有序列模式, 模式挖掘, 逐层查找, 二分查找

Abstract: Sequential pattern mining is an important subject of data mining with a wide application range. Previous studies in this field are mostly dedicated to mining frequent sequential patterns. On the contrary, the infrequent sequential patterns, say, rare sequential patterns (RSP), may reveal the uncommon regularities, so the rare sequences may be of higher interests to analysts. This paper defines the problem of mining rare sequential patterns, and proposes two level-wise algorithms for mining the complete set of all rare sequential patterns. Moreover, in order to overcome the problem of combinatorial explosion when mining the full set of rare sequential patterns, this paper proposes a binary search based algorithm to mine only the set of minimal rare sequential patterns (MRSP), which contains the information of all rare sequential patterns. The experimental results show that the proposed algorithms serve as effective solutions to the problem of mining rare sequential patterns.

Key words: rare sequential pattern, pattern mining, level-wise search, binary search