计算机科学与探索 ›› 2019, Vol. 13 ›› Issue (9): 1582-1592.DOI: 10.3778/j.issn.1673-9418.1806067

• 人工智能与模式识别 • 上一篇    下一篇

无线城市社团发现的研究——在Spark上利用改进关联规则实现社团发现的算法

王永贵,徐山珊,肖成龙   

  1. 辽宁工程技术大学 软件学院,辽宁 葫芦岛 125105
  • 出版日期:2019-09-01 发布日期:2019-09-06

Research on Wireless City Community Detection: Using Improved Association Rules to Achieve Community Detection Algorithm on Spark

WANG Yonggui, XU Shanshan, XIAO Chenglong   

  1. College of Software, Liaoning Technical University, Huludao, Liaoning 125105, China
  • Online:2019-09-01 Published:2019-09-06

摘要: 社团发现算法存在生成结果冗余及时间复杂度高等问题,虽然关联规则是解决社团发现问题的有效方法,但面临大量迭代计算的瓶颈。针对上述问题进行了研究,提出了一种改进社团发现的SIACD算法。该算法引入MAC地址和布尔矩阵的概念对数据进行预处理,利用基于项数的布尔向量交运算改进Apriori算法,再基于Spark实现算法并行化计算,通过关联规则的方式挖掘无线社团数据。实验结果表明,SIACD算法解决了生成结果冗余、复杂度高、迭代计算等问题,提升了社团发现的挖掘速度,提高了对大数据的处理能力。

关键词: 社团发现, 关联规则, 介质访问控制(MAC)地址, 布尔矩阵, Spark

Abstract: Community discovery algorithm has the problems such as redundant generated results and high time  complexity. Association rules are effective methods to solve community discovery problems, but confronted the   bottleneck of mass iterative calculation. In order to study the above problems, this paper proposes the SIACD (Spark-based use of improved Apriori to achieve community detection) algorithm for improving community discovery. The algorithm introduces MAC (media access control) address and Boolean matrix concept to preprocess the data, uses the item number-based Boolean vector intersection operation to improve Apriori algorithm, then realizes parallel calculation based on Spark, and mines the wireless community data by association rules. Experimental results show that the SIACD algorithm solves redundant generated results, high complexity, and iterative calculations problems, and improves community discovery??s mining speed and the ability to handle big data.

Key words: community discovery, association rules, media access control (MAC) address, Boolean matrix, Spark