计算机科学与探索 ›› 2015, Vol. 9 ›› Issue (9): 1075-1083.DOI: 10.3778/j.issn.1673-9418.1410042

• 数据库技术 • 上一篇    下一篇

关联规则挖掘算法Apriori的研究改进

周发超1,王志坚1,2+,叶  枫1,2,邓玲玲1   

  1. 1. 河海大学 计算机与信息学院,南京 211100
    2. 南京航空航天大学 计算机科学与技术学院,南京 210016
  • 出版日期:2015-09-01 发布日期:2015-12-11

Research and Improvement of Apriori Algorithm for Mining Association Rules

ZHOU Fachao1, WANG Zhijian1,2+, YE Feng1,2, DENG Lingling1   

  1. 1. College of Computer and Information, Hohai University, Nanjing 211100, China
    2. College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing 210016, China
  • Online:2015-09-01 Published:2015-12-11

摘要: 在关联规则挖掘领域有很多算法,其中最经典的是Apriori算法,该算法可找出所有的频繁项集,并发现项目间的关联关系,但是执行效率却很低。针对经典Apriori算法中存在的I/O过重,产生频繁项集,计算量过大等问题,提出了一种Apriori的改进方案I_Apriori,通过减少扫描数据库次数,降低候选项集计算复杂度以及减少预剪枝步骤计算量等途径提高了算法的执行效率。对比分析了Apriori和I_Apriori算法,I_Apriori算法计算复杂度更低,同时进行了对比实验,结果表明相比于Apriori算法,I_Apriori算法执行效率更高。

关键词: 关联规则, Apriori, I_Apriori, 复杂度, 效率

Abstract: There are many algorithms in the field of association rule mining, the most classic one is Apriori algorithm, which is used to find frequent itemsets and discovery association rules between projects, however, the execution efficiency of this algorithm is very low. Aiming at the problems of excessive I/O burden and large amount of calculation in the process of producing frequent itemsets and so on, this paper presents an improved scheme based on Apriori algorithm, named I_Apriori, which improves the efficiency of the algorithm execution by reducing the times of scanning database, reducing the computational complexity of candidate itemsets and reducing the calculation amount of the pre-pruning step and other ways. Contrasting Apriori and I_Apriori, the computational complexity of I_Apriori is lower. By contrast test, the results show that I_Apriori algorithm performs efficiently compared to the Apriori algorithm.

Key words: association rule, Apriori, I_Apriori, complexity, efficiency