计算机科学与探索 ›› 2011, Vol. 5 ›› Issue (12): 1131-1138.

• 学术研究 • 上一篇    下一篇

样例约简支持向量机

翟俊海, 王婷婷, 王熙照   

  1. 河北大学 数学与计算机学院 河北省机器学习与计算智能重点实验室, 河北 保定 071002
  • 收稿日期:1900-01-01 修回日期:1900-01-01 出版日期:2011-12-01 发布日期:2011-12-01

Instance Reduction Support Vector Machine

ZHAI Junhai, WANG Tingting, WANG Xizhao

  

  1. Key Laboratory of Machine Learning and Computational Intelligence, College of Mathematics and Computer Sci-ence, Hebei University, Baoding, Hebei 071002, China
  • Received:1900-01-01 Revised:1900-01-01 Online:2011-12-01 Published:2011-12-01

摘要: 支持向量机(support vector machine, SVM)仅利用靠近分类边界的支持向量构造最优分类超平面, 但求解SVM需要整个训练集, 当训练集的规模较大时, 求解SVM需要占用大量的内存空间, 寻优速度非常慢。针对这一问题, 提出了一种称为样例约简的寻找候选支持向量的方法。在该方法中, 支持向量大多靠近分类边界, 可利用相容粗糙集技术选出边界域中的样例, 作为候选支持向量, 然后将选出的样例作为训练集来求解SVM。实验结果证实了该方法的有效性, 特别是对大型数据库, 该方法能有效减少存储空间和执行时间。

关键词: 相容粗糙集, 样例选择, 支持向量机(SVM), 最优分类超平面, 统计学习理论

Abstract:

In support vector machine (SVM), the optimal classification hyperplane is constructed only from a subset of samples (support vectors) near the boundary. However, solving SVM is based on whole training set, when the training set is very large, it will take a long time to search the optimal solution and require a great amount of mem-ory. In order to deal with this problem, this paper presents a method named instance reduction for selecting the can-didate support vectors. In the proposed method, almost all support vectors are nearby the boundary of classification, the instances used as candidate support vectors in boundary region can be selected by tolerance rough set technique. The SVM is trained from the selected instances. The experimental results show that the proposed method is effective and can efficiently reduce the computational complexity both of time and space especially on large databases.

Key words: tolerance rough sets, instance selection, support vector machine (SVM), optimal classification hyper¬, plane, statistical learning theory