计算机科学与探索 ›› 2011, Vol. 5 ›› Issue (12): 1139-1152.

• 学术研究 • 上一篇    

利用细胞自动机模型约减最近邻分类规则

赵 理, 王 磊   

  1. 1. 西安理工大学 自动化与信息工程学院, 西安 710048
    2. 石家庄职业技术学院 信息工程系, 石家庄 050051
  • 收稿日期:1900-01-01 修回日期:1900-01-01 出版日期:2011-12-01 发布日期:2011-12-01

Condensing Nearest Neighbor Rule with Cellular Automata

ZHAO Li, WANG Lei   

  1. 1. College of Automation and Information Engineering, Xi’an University of Technology, Xi’an 710048, China
    2. Department of Information Engineering, Shijiazhuang Vocational Technology Institute, Shijiazhuang 050051, China
  • Received:1900-01-01 Revised:1900-01-01 Online:2011-12-01 Published:2011-12-01

摘要:

目前常见的最近邻分类规则约减算法, 只注重约减后分类器的分类精度和被约减的规则数量, 而不注重约减效率和约减后分类器的泛化能力。针对该问题, 提出了一种细胞自动机(cellular automata, CA)基础上的最近邻分类规则约减方法。该方法只保留不同类边界上的样本点, 约减规则的数量可以由细胞自动机网格的粒度动态调节。其优势在于: 在给定的大数据集前提下, 可以利用较少的运行时间来约减给定的规则样本; 可以利用积累或迭代的方式来分步获得原给定样本集的一致性子集。采用13个不同的数据集进行仿真实验, 结果显示该算法简单、有效, 较好地解决了大样本集的约减问题。

关键词: 最近邻规则, 细胞自动机, 约减, 一致性子集

Abstract: Most of current nearest neighbor rule condensation algorithms only guarantee the accuracy of classifier and the number of condensed rules, don’t consider the efficiency and generalization capability. This paper presents cellular automata (CA) based nearest neighbor rule condensation method to reduce useless points in a given training set. The method remains only the points on the boundary between different classes and the amount of condensed rules of the reference set can be revised by the granularity of the cellular automata lattice. The main advantages of the proposed method are, that it is able to condense a given rule set within less time when the number of the rules in a given rule set is very large, and can get a consistent reference set of the given set in an iterative or accumulated manner. This paper tests the method using 13 different datasets. The experiments show successful results when the size of the given dataset is very large.

Key words: nearest neighbor rule, cellular automata, condensation, consistent subset