计算机科学与探索 ›› 2018, Vol. 12 ›› Issue (7): 1145-1153.DOI: 10.3778/j.issn.1673-9418.1705080

• 人工智能与模式识别 • 上一篇    下一篇

绝对不平衡样本分类的集成迁移学习算法

么素素,王宝亮,侯永宏   

  1. 1. 天津大学 电气自动化与信息工程学院,天津 300072
    2. 天津大学 信息与网络中心,天津 300072
  • 出版日期:2018-07-01 发布日期:2018-07-06

Ensemble Transfer Learning Algorithm for Absolute Imbalanced Data Classification

YAO Susu, WANG Baoliang, HOU Yonghong   

  1. 1. School of Electrical and Information Engineering, Tianjin University, Tianjin 300072, China
    2. Information and Network Center, Tianjin University, Tianjin 300072, China
  • Online:2018-07-01 Published:2018-07-06

摘要:

针对训练数据绝对不平衡问题,提出了一种基于级联结构的集成迁移学习算法。该算法主要包括两部分:迁移学习部分和数据选择部分。在迁移学习阶段,针对TrAdaBoost算法中辅助领域样本权重不可恢复问题,引入权重恢复因子;在数据选择阶段,算法利用级联结构逐步删除辅助领域中噪声样本与冗余样本,在保证目标领域主导作用的同时充分利用辅助领域数据。在真实数据集上的实验结果表明,该算法在数据绝对不平衡的情况下,提升了分类器的综合评价指标与几何平均数,因此该算法可以在一定程度上解决数据绝对不平衡问题。

关键词: 集成迁移学习, 级联模型, 不平衡数据, TrAdaBoost

Abstract:

According to the problem of mining with absolute imbalanced data, this paper proposes an ensemble transfer learning algorithm based on cascade structure. The algorithm consists of two parts: the transfer learning and the data selection. At the transfer learning stage, to solve the problem that the weight of auxiliary domain data is irreversible in the TrAdaBoost algorithm, the weight recovery factor is introduced. At the data selection stage, the algorithm gradually deletes the noise samples and redundant samples of the auxiliary domain at each node of cascade structure. The algorithm makes full use of the auxiliary domain data while ensuring the leading role of the target domain. The experimental results on the real data sets show that the algorithm has better effect on the [F-measure] value and [G-mean] value under the condition of absolute imbalanced data. Therefore, the proposed algorithm can solve the problem of absolute imbalance of training data to a certain extent.

Key words: ensemble transfer learning, cascade module, imbalanced data, TrAdaBoost