计算机科学与探索 ›› 2011, Vol. 5 ›› Issue (5): 467-473.

• 学术研究 • 上一篇    下一篇

最近邻分类方法的研究

钟 智1, 朱曼龙2+, 张 晨2, 黄樑昌2   

  1. 1. 广西师范学院 计算机与信息工程学院, 南宁 530023
    2. 广西师范大学 计算机科学与信息工程学院, 广西 桂林 541004
  • 收稿日期:1900-01-01 修回日期:1900-01-01 出版日期:2011-05-01 发布日期:2011-05-01

Research on Nearest Neighbors Classification Techniques

ZHONG Zhi1, ZHU Manlong2+, ZHANG Chen2, HUANG Liangchang2   

  1. 1. College of Computer and Information Technology, Guangxi Teachers Education University, Nanning 530023, China 2. College of Computer Science and Information Technology, Guangxi Normal University, Guilin, Guangxi 541004, China
  • Received:1900-01-01 Revised:1900-01-01 Online:2011-05-01 Published:2011-05-01

摘要: 研究最近邻分类方法, 应用S 近邻技术的思想建立分类模型, 设计一个新的S 近邻(shelly nearest neighbor, SNN)分类算法, 克服了k 近邻(k nearest neighbor, kNN)分类算法在最近邻选择上可能存在偏好的问题。通过对传统的k 近邻和新构造的S 近邻分类算法的思想、关键技术等方面的分析, 以及在UCI 真实数据集实验上分类准确率的比较, 概括出算法适宜的环境条件, 并分析可能的原因。最后, 总结得出SNN分类算法对距离度量不敏感, 且在大数据集上具有更好分类效果的结论。

关键词: 分类, k 近邻算法, S 近邻算法, 分类准确率

Abstract: This paper studies classification techniques based on nearest-neighbor (NN), and designs a classification algorithm based on the shelly-NN (SNN) approach, which is without bias at selecting nearest neighbors. Traditional kNN classification and the SNN model are studied at the ideas of algorithm design and key techniques. Then they are also compared at the classification accuracy using several UCI data sets. Based on these researches, the paper gives the environment conditions suitable for the algorithms and analyzes the possible reasons. The results demonstrate that SNN algorithm is not sensitive to distance metrics and performs better at the classification accuracy on large data sets.

Key words: classification, k nearest neighbor algorithm, shelly nearest neighbor algorithm, classification accuracy