Journal of Frontiers of Computer Science and Technology ›› 2014, Vol. 8 ›› Issue (7): 868-876.DOI: 10.3778/j.issn.1673-9418.1403056

Previous Articles     Next Articles

Support Vector Machine Active Learning Strategy Based on Vector Cosine

GUO Husheng1, WANG Wenjian1,2+, BAI Longfei1   

  1. 1. School of Computer and Information Technology, Shanxi University, Taiyuan 030006, China
    2. Key Laboratory of Computational Intelligence and Chinese Information Processing, Shanxi University, Taiyuan 030006, China
  • Online:2014-07-01 Published:2014-07-02

基于向量余弦的支持向量机主动学习策略

郭虎升1,王文剑1,2+,白龙飞1   

  1. 1. 山西大学 计算机与信息技术学院,太原 030006
    2. 山西大学 计算智能与中文信息处理教育部重点实验室,太原 030006

Abstract: This paper proposes a support vector machine (SVM) active learning strategy based on vector cosine for the high dimensional dataset to solve the problem that the traditional support vector machine based on active learning can not measure the correlation degree of different samples by Euclidean distance and obtains the low generalization ability, namely COS_SVMactive method. By measuring the information redundancy of training samples based on vector cosine on active learning procedure, several the most valuable samples are selected and need be labeled by experts. In each samples labeling loop, the balance of labeled data is gradually adjusted in order to achieve good generalization performance. The experimental results demonstrate that, compared with common SVM active learning based on random sampling (RS_SVMactive) and SVM active learning based on distance (DIS_SVMactive) methods, the proposed COS_SVMactive method can not only improve classification accuracy, but also reduce the artificial labeling cost.

Key words: support vector machine, active learning, vector cosine, redundancy, balance

摘要: 针对传统基于主动学习的支持向量机(support vector machine,SVM)方法中所采用的欧式距离不能有效衡量高维样本之间的相关程度,导致学习器泛化能力下降的问题,提出了一种基于向量余弦的支持向量机主动学习(SVM active learning based on vector cosine)策略,称为COS_SVMactive方法。该方法通过在主动学习过程中引入向量余弦来度量训练集中样本信息的冗余度,以挑选那些含有重要分类信息的最有价值样本交给专家进行人工标注,并在迭代的样本标注过程中对训练集的平衡度进行逐步调整,使学习器获得更好的泛化性能。实验结果表明,与传统基于随机采样的SVM主动学习方法(SVM active learning based on random sampling,RS_SVMactive)和基于距离的SVM主动学习方法(SVM active learning based on distance,DIS_SVMactive)相比,COS_SVMactive方法不仅可以提高分类精度,而且能够减少专家标记代价。

关键词: 支持向量机, 主动学习, 向量余弦, 冗余度, 平衡度