计算机科学与探索 ›› 2012, Vol. 6 ›› Issue (5): 473-480.DOI: 10.3778/j.issn.1673-9418.2012.05.009

• 学术研究 • 上一篇    

基于改进图半监督学习的个人信用评估方法

张 燕1,张晨光1+,张夏欢2   

  1. 1. 海南大学 信息科学技术学院,海口 570228
    2. 北京工业大学 计算机学院,北京 100124
  • 出版日期:2012-05-01 发布日期:2012-05-09

Personal Credit Scoring Method Using Improved Graph Based Semi-Supervised Learning

ZHANG Yan1, ZHANG Chenguang1+, ZHANG Xiahuan2   

  1. 1. College of Information Science and Technology, Hainan University, Haikou 570228, China
    2. College of Computer Science, Beijing University of Technology, Beijing 100124, China
  • Online:2012-05-01 Published:2012-05-09

摘要: 针对个人信用评估中未标号数据获取容易而已标号数据获取相对困难,以及普遍存在的数据不对称问题,提出了基于改进图半监督学习技术的个人信用评估模型。该模型采用了半监督学习技术,一方面能从大量的未标号数据中学习,避免了个人信用评估中已标号数据相对缺乏造成的泛化能力下降问题;另一方面,通过改进图半监督学习技术,对图半监督迭代结果进行归一化及修改决策边界,有效减小了数据不对称的影响。在UCI的三个信用审核数据集上的评测结果表明,该模型具有明显优于支持向量机和改进前方法的评估效果。

关键词: 信用评估, 支持向量机, 图半监督学习, 不对称数据集

Abstract: Labeled instances are expensive to collect for personal credit scoring. However, unlabeled data are often relatively easy to obtain. Aiming at this problem and the ubiquitous asymmetry of credit datasets, this paper proposes a personal credit scoring model based on improved graph based semi-supervised learning method. Because the model adopts semi-supervised technology, it can learn from abundant unlabeled instances to avoid the decreasing of generalization ability which is induced by the relative lack of labeled data. Furthermore, by improving graph based semi-supervised learning technology with normalization and modification of decision boundary on its iterative results, the scoring model effectively reduces the bad impact of asymmetric dataset. Experiments on three UCI credit approval datasets show that the new scoring model can provide significantly better results than support vector machines and the unimproved method.

Key words: credit scoring, support vector machine, graph based semi-supervised learning, asymmetric dataset