Journal of Frontiers of Computer Science and Technology ›› 2022, Vol. 16 ›› Issue (5): 1128-1135.DOI: 10.3778/j.issn.1673-9418.2010055

• Artificial Intelligence • Previous Articles     Next Articles

Detection of Health Data Based on Gaussian Mixture Generative Model

ZHU Zhuangzhuang(), ZHOU Zhiping   

  1. School of Internet of Things Engineering, Jiangnan University, Wuxi, Jiangsu 214122, China
  • Received:2020-10-20 Revised:2021-02-02 Online:2022-05-01 Published:2022-05-19
  • About author:ZHU Zhuangzhuang, born in 1995, M.S. candidate, student member of CCF. His research interests include control engineering and application.
    ZHOU Zhiping, born in 1962, Ph.D., professor. His research interests include detection technology and automation device, information security, etc.

高斯混合生成模型检测健康数据异常

朱壮壮(), 周治平   

  1. 江南大学 物联网工程学院,江苏 无锡 214122
  • 通讯作者: + E-mail: 1543428968@qq.com
  • 作者简介:朱壮壮(19995-),男,硕士研究生,CCF学生会员,主要研究方向为控制工程及应用。
    周治平(1962-),男,博士,教授,主要研究方向为检测技术与自动化装置、信息安全等。

Abstract:

Sports bracelet provides rich information for a comprehensive understanding of people’s physical health in the context of the popularity of smart wearable devices. However, some unknown outliers inevitably exist in the provided multidimensional activity data and the detection of outliers is necessary. Due to the “dimension disaster”, it is difficult to estimate the density by traditional methods, leading to poor detection performance. Aiming at the problem, a method of detecting health data is utilized, called Gaussian mixture generative model (GMGM). The model uses a variational autoencoder (VAE) to train the original data and latent features can be extracted by minimizing the reconstruction error. Then, the deep belief network (DBN) is used to predict the sample mixture membership with the help of potential distribution and the extracted features. Next, VAE, DBN and Gaussian mixture model (GMM) are optimized together to avoid the influence of model decoupling. Finally, the density of each sample point is predicted by GMM and the samples whose density is higher than the threshold in the training stage will be viewed as outliers. The performance of the GMGM is verified on the ODDS standard datasets. The results show that the model achieves a promotion of 5.5 percentage points for AUC score compared with deep autoencoding Gaussian mixture model (DAGMM). Finally, the experimental results on real datasets also show the effectiveness of GMGM.

Key words: variational autoencoder (VAE), deep brief network (DBN), Gaussian mixture model (GMM), health data, anomaly detection

摘要:

在智能穿戴设备普及的背景下,运动手环为全面地了解人们的身体状况提供了丰富的信息源,但是其提供的多维活动数据存在未知的异常值,因此需要进行异常值的检测。由于“维度灾难”,通过传统的方法进行密度估计十分困难,导致检测效果不佳。针对该问题,使用了一种高斯混合生成模型(GMGM)健康数据检测方法。首先,该模型利用变分自编码器(VAE)训练原始数据,并且通过降低重构误差提取潜在特征。然后,利用深度信念网络(DBN),通过潜在分布和提取的特征来预测样本的混合成员隶属度。接着,变分自编码器、深度信念网络与高斯混合模型(GMM)共同优化,避免了模型解耦的影响。高斯混合模型预测得到每个数据的样本密度,将密度高于训练阶段阈值的样本视为异常。在ODDS标准数据集上验证模型的性能,结果表明,相比深度自编码器高斯混合模型(DAGMM),GMGM的AUC指标平均提升了5.5个百分点。最后,在真实数据集上的实验结果也表明了该方法的有效性。

关键词: 变分自编码器(VAE), 深度信念网络(DBN), 高斯混合模型(GMM), 健康数据, 异常检测

CLC Number: