Journal of Frontiers of Computer Science and Technology ›› 2019, Vol. 13 ›› Issue (12): 2094-2102.DOI: 10.3778/j.issn.1673-9418.1810006

Previous Articles     Next Articles

Gaussian Kernel Density Estimation Method for Detecting Abnormal Values of Health Data

WANG Kang, ZHOU Zhiping   

  1. School of Internet of Things Engineering, Jiangnan University, Wuxi, Jiangsu 214122, China
  • Online:2019-12-01 Published:2019-12-10

高斯核密度估计方法检测健康数据异常值

王康周治平   

  1. 江南大学 物联网工程学院,江苏 无锡 214122

Abstract: Aiming at the problem that the activity data collected by the sports bracelet have unknown abnormal data in the context of the popularity of smart wearable devices, a method for detecting abnormal data of health data based on Gaussian kernel density estimation is proposed. Firstly, the t-distributed stochastic neighbor embedding is used to extract the features of the data set and enhance the local structure of the data. Then the Gaussian kernel local density is used to replace the local reachable density in the local outlier factor algorithm. A new algorithm for calculating the outlier factor is proposed, which is called Gaussian kernel density estimation-based local outlier factor (GKDELOF) algorithm. The stability of the GKDELOF algorithm??s judgment threshold is derived and analyzed. Finally, the accuracy of the algorithm is verified by the simulation experiment on the UCI standard data set. An experimental analysis is performed on the actual activity data collected by sports bracelets. The experiment results show that this method can solve the sparse problem of health data, which is caused by complex and diverse activity, and can detect abnormal samples accurately.

Key words: sports bracelet, health data, abnormal data detection, local outlier factor, Gaussian kernel density estimation

摘要: 针对智能穿戴设备普及背景下,利用运动手环采集的活动数据存在未知异常数据的问题,提出一种基于高斯核密度估计的健康数据异常值检测方法。首先采用t-分布邻域嵌入算法对数据集进行特征提取,增强数据局部结构能力;接着利用高斯核局部密度代替局部异常因子算法中的局部可达密度,提出基于高斯核密度估计离群因子(GKDELOF)算法,推导分析了该算法判断阈值的稳定性;最后在UCI标准数据集上进行仿真实验,验证算法的准确性,并在选取的真实运动手环所采集的活动数据集上进行实验分析。实验结果表明,该方法能够解决由活动复杂多样性造成的健康数据稀疏问题,准确检测出异常值。

关键词: 运动手环, 健康数据, 异常值检测, 局部异常因子, 高斯核密度估计