计算机科学与探索 ›› 2012, Vol. 6 ›› Issue (7): 644-653.DOI: 10.3778/j.issn.1673-9418.2012.07.008

• 学术研究 • 上一篇    下一篇

高斯核尺度空间中的采样算法研究

朱顺痣1+,施  华1,刘利钊1,叶东毅2   

  1. 1. 厦门理工学院 计算机科学与技术系,福建 厦门 361024
    2. 福州大学 空间数据挖掘与信息共享教育部重点实验室,福州 350002
  • 出版日期:2012-07-01 发布日期:2012-07-02

Sampling Method Based on Scale Space with Gaussian Kernel

ZHU Shunzhi1+, SHI Hua1, LIU Lizhao1, YE Dongyi2   

  1. 1. Department of Computer Science and Technology, Xiamen University of Technology, Xiamen, Fujian 361024, China
    2. Key Lab of Spatial Data Mining & Information Sharing, Ministry of Education, Fuzhou University, Fuzhou 350002, China
  • Online:2012-07-01 Published:2012-07-02

摘要: 将线性尺度空间的特征点扩展问题转化为多尺度数据集的同尺度内分类问题,该问题属于尺度不变的非平衡数据集分类问题。提出了一种基于尺度空间的核学习的采样算法来处理支持向量机(support vector machine,SVM)在非平衡数据集上的分类问题。其核心思想是首先在核空间中对少数类样本进行上采样,然后通过输入空间和核空间的距离关系寻找所合成样本在输入空间的原像,最后再采用SVM对其进行训练,从而有效克服了目前采样方法在不同空间处理训练样本所带来的数据不一致问题。该算法所采用的采样策略不仅能够降低数据失衡率,而且能够拓展少数类样本所形成的凸壳,从而更为有效地纠正最优分类超平面偏移问题。实验结果证明,所获得的结果分类器具有更好的泛化性能,能够在同尺度内有效扩展稳定特征点数量。

关键词: 分类, 高斯核, 尺度空间, 凸壳, 非平衡数据集

Abstract: The expansion of feature points of the linear scale space is transformed into the classification of multi-scale data set within the same scale, which belongs to the classification of scale invariant non-equilibrium. This paper presents a sample approach based on scale space with Gaussian kernel learning to solve classification on imbalance dataset by support vector machine (SVM). The method first preprocesses the data by over-sampling the minority class in kernel space, then finds the pre-images of the synthetic samples based on a distance relation between kernel space and input space, finally appends these pre-images to the original dataset to train. As a result, the inconsistency which is brought about by processing samples in different spaces is overcome. The sampling strategies of the method not only can decrease imbalanced rate of training dataset, but also can enlarge convex hull of the minority class. Consequently, the problem of boundary skew can be amended more effectively. Experimental results on real dataset indicate that the generalization performance of the result classifier is improved and the algorithm can work well on expanding the feature points stably for a certain scale.

Key words: classification, Gaussian kernel, scale space, convex hull, imbalanced datasets