计算机科学与探索 ›› 2023, Vol. 17 ›› Issue (6): 1329-1342.DOI: 10.3778/j.issn.1673-9418.2112076

• 理论·算法 • 上一篇    下一篇

可能性聚类假设的多模适应学习方法

但雨芳,陶剑文,赵悦,潘婕,赵宝奇   

  1. 1. 宁波职业技术学院 电子信息工程学院,浙江 宁波 315800
    2. 哈尔滨工业大学 航天学院,哈尔滨 150001
  • 出版日期:2023-06-01 发布日期:2023-06-01

Multi-model Adaptation Method of Possibilistic Clustering Assumption

DAN Yufang, TAO Jianwen, ZHAO Yue, PAN Jie, ZHAO Baoqi   

  1. 1. School of Electronics and Information Engineering, Ningbo Polytechnic, Ningbo, Zhejiang 315800, China
    2. School of Astronautics, Harbin Institute of Technology, Harbin 150001, China
  • Online:2023-06-01 Published:2023-06-01

摘要: 基于图的半监督学习(GSSL)凭借其直观性和良好的学习性能,在机器学习领域吸引了越来越多的关注。然而,通过分析发现,现有基于图的半监督学习方法存在对噪声、异常数据的鲁棒性不够好以及较敏感的问题。此外,该方法具有较好性能的前提是训练数据与测试数据为独立同分布(IID),导致在实际应用中存在一定的局限性。为解决上述问题,在某个再生核Hilbert空间,在充分考虑最小化噪声、异常数据影响的基础上,结合不同数据分布特点,基于结构风险最小化模型,提出一种基于可能性聚类假设的多模型适应学习方法(MA-PCA)。其主要思想为:通过模糊熵减弱噪声、异常数据对方法所带来的负面影响;综合考虑训练数据与测试数据在独立同分布和在独立不同分布时进行有效的多模适应学习,弱化训练数据和测试数据的独立同分布约束条件亦具有较好性能;给出了算法实现及其收敛性定理。在多个真实视觉数据集上分别进行了大量实验并进行深入分析,证实了所提方法具有优越的或可比较的鲁棒性和泛化性能。

关键词: 基于图的半监督学习(GSSL), 多模适应, 可能性聚类, 模糊熵

Abstract: Graph based semi-supervised learning (GSSL) has been attracting more and more attention with its intui-tiveness and good learning performance in the machine learning community. However, it is found that existing graph based semi-supervised learning method has the problem of poor robustness and sensitivity to noise and abnor-mal data by analysis. In addition, the premise for the GSSL to have good performance is that the training data and test data are independently identically distribution (IID), which leads to some limitations in practical applications. In order to solve above problems, this paper proposes a novel clustering method based on structure risk minimization model, called a multi-model adaptation method of possibilistic clustering assumption (MA-PCA), and effectively minimizes the influence from the noise and abnormal instances based on different data distributions in some reproduced kernel Hilbert space. Its main ideas are as follows: the negative impact of noise and abnormal data on the method is reduced through fuzzy entropy; considering the effective multi-model adaptive learning of training data and test data in the same distribution and different distributions, it can also obtain good performance by rela-xing the constraint of IID between training data and test data; the algorithm implementation and convergence the-orem are given. A large number of experiments and in-depth analysis on multiple real visual datasets show that the proposed method has superior or comparable robustness and generalization performance.

Key words: graph based semi-supervised learning (GSSL), multi-model adaptation, possibilistic clustering, fuzzy entropy