Variational Deep Generative Clustering Model Under Entropy Regularizations

doi:10.3778/j.issn.1673-9418.2104091

Abstract

Abstract: The clustering method based on deep learning can automatically learn the latent features of data, and can be easily generalized to large-scale datasets with high-dimension. Traditional deep clustering methods pay more at-tention to extracting hidden layer features of data through deep neural networks to improve clustering accuracy, and less analyze the determinism of data categories in clustering tasks. At the same time, there is a lack of analysis of the discrete latent vector distribution after imposing constraints. This paper proposes a variational deep generative clustering model under entropy regularizations (VDGC-ER), which uses the variational auto-encoder as the basic framework and introduces the Gaussian mixture model as prior of the latent variables. This paper first proposes the sample entropy regularization term to the discrete latent vector of Gaussian mixture model to improve the clustering accuracy of the model. Further, this paper defines the aggregated sample entropy regularization term on the discrete latent vector to reduce the clustering imbalance, so that it can avoid local optimization and improve the generative diversity. Then, this paper uses the Monte Carlo sampling and re-parameterization strategies to estimate the optimi-zation objective of VDGC-ER model, and uses the stochastic gradient descent method to calculate the model para-meters. Finally, this paper designs the comparison experiments on MNIST, REUTERS, REUTERS-10K and HHAR datasets to demonstrate the performance of the VDGC-ER model. Experimental results show that the model can not only generate high quality samples, but also present high accuracy clustering.

Key words: variational autoencoder, probabilistic generative model, variational inference, entropy regularization, clustering

摘要： 基于深度学习的聚类方法可以自动学习到数据的隐层特征表示，并可方便应用于高维大规模数据集上。传统深度聚类方法更多关注通过深层神经网络去提取数据的隐层特征来提升聚类精度，较少对聚类任务中数据类别的确定性问题进行分析，同时缺乏对施加约束后的离散隐向量分布的分析。提出熵正则化下的变分深度生成聚类模型（VDGC-ER），以变分自编码为基础框架，对连续向量进行高斯混合先验建模，并以高斯混合中的离散隐向量作为类别向量。通过对离散隐向量引入样本熵正则化项增强预测聚类类别的区分度，同时对离散隐向量定义聚合样本熵正则化项以降低聚类不平衡，避免局部最优，并提升生成数据多样性。之后，采用蒙特卡洛采样及重参策略估计VDGC-ER模型的优化目标，并利用随机梯度下降法求解模型参数。最后在MNIST数据集、REUTERS数据集、REUTERS-10K数据集和HHAR数据集上设计了对比实验，验证了VDGC-ER模型不仅可以生成高质量的样本，而且可以显著提升聚类精度。

关键词: 变分自编码, 概率生成模型, 变分推理, 熵正则化, 聚类

ZHANG Zhiyuan, CHEN Yarui, YANG Jianning, DING Wenqiang, YANG Jucheng. Variational Deep Generative Clustering Model Under Entropy Regularizations[J]. Journal of Frontiers of Computer Science and Technology, 2023, 17(2): 376-384.

张志远, 陈亚瑞, 杨剑宁, 丁文强, 杨巨成. 熵正则化下的变分深度生成聚类模型[J]. 计算机科学与探索, 2023, 17(2): 376-384.

References

[1] VON LUXBURG U. A tutorial on spectral clustering[J]. Sta-tistics and Computing, 2007: 395-416.
[2] MACQUEEN J B. Some methods for classification and anal-ysis of multivariate observations[C]//Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Proba-bility, Berkeley, Jun 21-Jul 18, 1965 and Dec 27, 1965-Jan 7, 1966. Berkeley: University of California Press, 1967: 281-297.
[3] JORDAN M. Pattern recognition and machine learning[M]. Berlin, Heidelberg: Springer, 2006.
[4] JIANG Z X, ZHENG Y, TAN H C, et al. Variational deep embedding: an unsupervised generative approach to cluste-ring[C]//Proceedings of the 26th International Joint Confe-rence on Artificial Intelligence, Melbourne, Aug 19-25, 2017. New York: ACM, 2017: 1965-1972.
[5] DILOKTHANAKUL N,MEDIANO P A M, GARNELO M,et al. Deep unsupervised clustering with gaussian mixture variational autoencoders[C]//Proceedings of the 2017 Inter-national Conference on Learning Representations, Toulon, Apr 24-26, 2017: 1-12.
[6] SHAHAM U, STANTON K P, LI H, et al. SpectralNet: spec-tral clustering using deep neural networks[C]//Proceedings of the 6th International Conference on Learning Representa-tions,Vancouver, Apr 30-May 3, 2018: 1-21.
[7] YANG B, FU X, SIDIROPOULOS N D, et al.Towards K-means-friendly spaces: simultaneous deep learning and clus-tering[C]//Proceedings of the 34th International Conference on Machine Learning, Sydney, Aug 6-11, 2017: 3861-3870.
[8] XIE J Y, GIRSHICK R, FARHADI A. Unsupervised deep embedding for clustering analysis[C]//Proceedings of the 33rd International Conference on Machine Learning, New York, Jun 19-24, 2016: 478-487.
[9] ZHANG D J, SUN Y F, ERIKSSON B, et al. Deep unsuper-vised clustering using mixture of autoencoders[J]. arXiv:1712.07788, 2017.
[10] KINGMA D P, WELLING M. Auto-encoding variational Bayes[J]. arXiv:1312.6114, 2013.
[11] GOODFELLOW I J, POUGET-ABADIE J, MIRZA M, et al. Generative adversarial nets[C]//Proceedings of the Annual Conference on Neural Information Processing Systems 2014, Montreal, Dec 8-13, 2014. Red Hook: Curran Associates, 2014: 2672-2680.
[12] MAKHZANI A, SHLENS J, JAITLY N, et al. Adversarial autoencoders[J]. arXiv:1511.05644, 2016.
[13] YANG L X, CUEUNG N M, LI J Y, et al. Deep clustering by Gaussian mixture variational autoencoders with graph embedding[C]//Proceedings of the 2019 IEEE/CVF Interna-tional Conference on Computer Vision, Seoul, Oct 27-Nov 2, 2019. Piscataway: IEEE, 2019: 6439-6448.
[14] KINGMA D P, SALIMANS T, JOZEFOWICZ R, et al. Imp-roved variational inference with inverse autoregressive flow[C]//Proceedings of the 30th Conference on Neural Infor-mation Processing Systems, Dec 5-10, 2016. Red Hook: Cur-ran Associates, 2016: 4743-4751.
[15] GUO C S, ZHOU J L, CHEN H H, et al. Variational auto-encoder with optimizing Gaussian mixture model priors[J]. IEEE Access, 2020: 43992-44005.
[16] LIU G J, LIU Y, GUO M Z, et al. Variational inference with Gaussian mixture model and householder flow[J]. Neural Networks, 2019: 43-55.
[17] OPOCHINSKY Y, CHAZAN S E, GANNOT S, et al. K-autoencoders deep clustering[C]//Proceedings of the 2020 IEEE International Conference on Acoustics, Speech and Signal Processing, Barcelona, May 4-8, 2020. Piscataway: IEEE, 2020: 4037-4041.
[18] CHOI K S, SHIN J S, LEE J J, et al. Gradient-based lea-rning applied to document recognition[J]. Proceedings of the IEEE, 1998, 86(11): 2278-2324.
[19] LEWIS D D, YANG Y M, ROSE T G, et al. RCV1: a new benchmark collection for text categorization research[J]. Jou-rnal of Machine Learning Research, 2004, 5: 361-397.
[20] STISEN A, BLUNCK H, BHATTACHARYA S, et al. Smart devices are different: assessing and mitigating mobile sen-sing heterogeneities for activity recognition[C]//Proceedings of the 13th ACM Conference on Embedded Networked Sen-sor Systems, Seoul, Nov 1-4, 2015. New York: ACM, 2015: 127-140.
[21] KINGMA D P, BA J L. Adam: a method for stochastic opti-mization[C]//Proceedings of the 3rd International Conference on Learning Representation, San Diego, May 7-9, 2015: 1-15.
[22] KUHN H W. The Hungarian method for the assignment pro-blem[J]. Naval Research Logistics Quarterly, 1955, 2(1): 83-97.
[23] MAATEN L V D, HINTON G. Visualizing data using t-SNE[J]. Journal of Machine Learning Research, 2008, 9: 2579-2605.