计算机科学与探索 ›› 2019, Vol. 13 ›› Issue (4): 586-595.DOI: 10.3778/j.issn.1673-9418.1806029

• 数据挖掘 • 上一篇    下一篇

深度卷积自编码图像聚类算法

谢娟英+,侯  琦,曹嘉文   

  1. 陕西师范大学 计算机科学学院,西安 710119
  • 出版日期:2019-04-01 发布日期:2019-04-10

Image Clustering Algorithms by Deep Convolutional Autoencoders

XIE Juanying+, HOU Qi, CAO Jiawen   

  1. School of Computer Science, Shaanxi Normal University, Xi'an 710119, China
  • Online:2019-04-01 Published:2019-04-10

摘要: 针对现有深度卷积嵌入聚类算法(deep convolutional embedded clustering,DCEC)的网络特征损失过大,对复杂图像没有提取有效特征的问题,提出一个具有17层网络结构的无监督深度聚类框架,并在编码层加入下采样层,减少参数和防止过拟合;在解码层加入上采样层还原下采样造成的细节损失。分别结合DEC(deep embedded clustering)算法的损失函数和IDEC(improved deep embedded clustering)算法的采用局部结构保留优势的损失函数,得到两种基于卷积自编码的深度学习图像聚类算法DEC_DCNN(deep embedded clustering based on deep convolutional neural network)和IDEC_DCNN(improved deep embedded clustering based on deep convolutional neural network),并使用自适应矩估计(adaptive moment estimation,Adam)和小批量随机梯度下降(mini-batch stochastic gradient decent,mini-batch SGD)两种优化方法调整模型参数。3个经典图像数据集的实验结果显示,提出的17层网络结构对图像特征具有很好的鲁棒性和通用性,基于该    网络结构的深度聚类算法取得了远优于现有深度聚类算法的结果,其聚类准确率均优于对比算法;对深度  聚类算法DEC_DCNN和IDEC_DCNN的聚类结果准确率、指标值AMI(adjusted mutual information)和ARI(adjusted rand index)进行比较,IDEC_DCNN比DEC_DCNN的聚类性能更好,说明IDEC_DCNN算法的性能更优越。

关键词: 深度图像聚类, 卷积自编码, 卷积神经网络(CNN), 深度学习, 聚类

Abstract: To avoid the big characteristic loss of deep convolutional embedded clustering (DCEC) algorithm, especially for complex images, a 17-layer deep network framework is proposed in this paper for unsupervised deep image clustering analysis, where subsampling layer is embedded in encode layers to reduce parameters and prevent overfitting while up-sampling is embedded in decode layers to restore the specific loss by subsampling in encode layers. Combining the loss functions of deep embedded clustering (DEC) and improved deep embedded clustering (IDEC), two deep convolutional autoencoder based algorithms for image clustering analysis are proposed in this paper, named as DEC_DCNN (deep embedded clustering based on deep convolutional neural network) and IDEC_DCNN (improved deep embedded clustering based on deep convolutional neural network) respectively. Adam (adaptive moment estimation) and Mini-Batch SGD (mini-batch stochastic gradient decent) are adopted to optimize parameters for the proposed algorithms. Three typical image datasets are used to test the power of the proposed algorithms. The experimental results demonstrate that the proposed 17-layer deep network framework is very robust and general. The DEC_DCNN and IDEC_DCNN algorithms based on the proposed deep network framework have got higher clustering accuracy (ACC) than that of the available clustering algorithms. The IDEC_DCNN is superior to DEC_DCNN in terms of benchmark metrics including AMI (adjusted mutual information), ARI (adjusted rand index) and ACC, which further demonstrates the advantages of IDEC_DCNN.

Key words: deep image clustering, convolutional autoencoders, convolutional neural network (CNN), deep learning, clustering