• 人工智能 •

### 基于加权网格和信息熵的并行密度聚类算法

1. 1. 江西理工大学 信息工程学院，江西 赣州 341000
2. 江西理工大学 应用科学学院 信息工程系，江西 赣州 341000
• 出版日期:2020-12-01 发布日期:2020-12-11

### Parallel Density-Based Clustering Algorithm by Using Weighted Grid and Information Entropy

HU Jian, XU Kaibin, MAO Yimin

1. 1. School of Information Engineering, Jiangxi University of Science and Technology, Ganzhou, Jiangxi 341000, China
2. Department of Information Engineering, College of Applied Science, Jiangxi University of Science and Technology, Ganzhou, Jiangxi 341000, China
• Online:2020-12-01 Published:2020-12-11

Abstract:

Aiming at the problems of unreasonable division of data gridding, low accuracy of clustering results and low efficiency of parallelization in big data clustering algorithm based on density, this paper proposes a density-based clustering algorithm by using weighted grid and information entropy based on MapReduce, named DBWGIE-MR. Firstly, an adaptive division grid (ADG) strategy is proposed to divide the cell of grid adaptively. Secondly, a weighted grid construction strategy, neighboring expand (NE) which can strengthen relevance between grids is designed to improve the accuracy of clustering. Meanwhile, based on weighted grid and information entropy (WGIE), a density calculation strategy is designed to calculate the density of grid. In addition, the ε-neighborhood and core object of density-based clustering algorithm are recalculated, which is suitable for weighted grid. Then, COMCORE-MR (core clusters computing algorithm based on MapReduce) algorithm is proposed to compute the local clusters of clustering algorithm in parallel. Finally, based on disjoint-set and MapReduce, MECORE-MR (merge core cluster by using MapReduce) algorithm is proposed to speed up the convergence speed of merging local clusters, which improves the local clusters merging efficiency of density-based clustering algorithm. The experimental results show that the DBWGIE-MR algorithm has better clustering results and performs better parallelization in large scale dataset.