Journal of Frontiers of Computer Science and Technology ›› 2018, Vol. 12 ›› Issue (3): 472-483.DOI: 10.3778/j.issn.1673-9418.1703025

Previous Articles     Next Articles

Transfer Spectral Clustering Based on Inter-Domain F-Norm Regularization

WEI Caina+, QIAN Pengjiang, XI Chen   

  1. School of Digital Media, Jiangnan University, Wuxi, Jiangsu 214122, China
  • Online:2018-03-01 Published:2018-03-08


魏彩娜+,钱鹏江,奚  臣   

  1. 江南大学 数字媒体技术学院,江苏 无锡 214122

Abstract:  Traditional clustering algorithm usually has poor clustering performance in the cases where the target data are fairly distorted by noise. In order to address such challenge, based on the classic spectral clustering (SC) algorithm, and by using the strategy of transfer learning, this paper proposes the transfer spectral clustering algorithm based on inter-domain F-norm regularization (TSC-IDFR). For the data in the target domain, TSC-IDFR firstly selects the referenced examples, of which the sampling size is the same as the data size in the target domain, from the source domain (historical data) by means of the principle of the Kth nearest neighbor. Then, in terms of the mechanism of inter-domain F-norm regularization, the matrix composed of the spectral eigenvectors of the selected referenced examples from the source domain is used to assist the spectral clustering on the target data. As such, TSC-IDFR successfully resolves the clustering on the target data set (target domain) even if it contains much noise. The effectiveness of the proposed algorithm has been demonstrated by experimental studies on both synthetic and real data sets.

Key words: transfer learning, spectral clustering, regularization

摘要: 传统聚类算法在目标数据集被噪声或异常数据大量污染的场景下聚类效果不佳。针对此问题,在经典谱聚类算法(spectral clustering,SC)基础上加入迁移学习知识,提出了新的域间F-范数正则化迁移谱聚类算法(transfer spectral clustering based on inter-domain F-norm regularization,TSC-IDFR)。该算法通过第[K]最近邻原则为目标域数据从源域(历史数据)获取等量的可参照数据样本,然后基于域间F范数正则化机制,迁移这些源域可参照数据样本的谱聚类特征矩阵,以辅助目标域数据集上的谱聚类过程,从而解决实际问题中由于目标域数据污染带来的聚类难题,最终提高谱聚类效果。通过在模拟数据集和真实数据集上的仿真实验,证明了该算法的有效性。

关键词: 迁移学习, 谱聚类, 正则化