计算机科学与探索 ›› 2016, Vol. 10 ›› Issue (12): 1763-1772.DOI: 10.3778/j.issn.1673-9418.1509075

• 人工智能与模式识别 • 上一篇    下一篇

基于邻域离散度的异常点检测算法

沈琰辉,刘华文+,徐晓丹,赵建民,陈中育   

  1. 浙江师范大学 数理与信息工程学院,浙江 金华 321004
  • 出版日期:2016-12-01 发布日期:2016-12-07

Outlier Detection Algorithm Based on Dispersion of Neighbors

SHEN Yanhui, LIU Huawen+, XU Xiaodan, ZHAO Jianmin, CHEN Zhongyu   

  1. College of Mathematics, Physics and Information Engineering, Zhejiang Normal University, Jinhua, Zhejiang 321004, China
  • Online:2016-12-01 Published:2016-12-07

摘要: 异常点检测在机器学习和数据挖掘领域中有着十分重要的作用。当前异常点检测算法的一大缺陷是正常数据在边缘处异常度较高,导致在某些情况下误判异常点。为了解决该问题,提出了一种新的基于邻域离散度的异常点检测算法。该算法将数据点所在邻域的离散度作为该数据点的异常度,既能有效避免边缘数据点的异常度过高,又能较好地区分正常点与异常点。实验结果表明,该算法能够有效地检测数据中的异常点,并且算法对参数选择不敏感,性能较为稳定。

关键词: 异常点检测, 机器学习, 数据挖掘, 主成分分析

Abstract: Outlier detection is an important task of machine learning and data mining. A major limitation of the existing outlier detection methods is that the outlierness of border points may be very high, leading to yield misleading results in some situations. To cope with this problem, this paper proposes a novel outlier detection algorithm based on the    dispersion of neighbors. The proposed algorithm adopts the dispersion of a data point??s neighbors as its outlier degree, thus the outlierness of border points will not be very high while the normal data and outliers can still be well distinguished. The experimental results show the proposed algorithm is more effective in detecting outliers, less sensitive to parameter settings and is stable in terms of performance.

Key words: outlier detection, machine learning, data mining, principal component analysis