计算机科学与探索 ›› 2024, Vol. 18 ›› Issue (3): 674-692.DOI: 10.3778/j.issn.1673-9418.2211120

• 理论·算法 • 上一篇    下一篇

可能性分布距离度量:一种鲁棒的域适应学习方法

但雨芳,陶剑文   

  1. 宁波职业技术学院 人工智能应用研究所,浙江 宁波 315800
  • 出版日期:2024-03-01 发布日期:2024-03-01

Possibilistic Distribution Distance Measure: Robust Domain Adaptation Learning Method

DAN Yufang, TAO Jianwen   

  1. Institute of Artificial Intelligence Application, Ningbo Polytechnic, Ningbo, Zhejiang 315800, China
  • Online:2024-03-01 Published:2024-03-01

摘要: 领域适应(DA)学习旨在解决训练数据集与测试数据集分布不一致问题而广受关注,现有方法大多采用最小化领域间最大均值差(MMD)或其变体来解决域分布不一致问题。然而,领域中存在的噪声数据将会导致领域均值发生明显漂移,会在一定程度上影响基于MMD及其变体的学习方法的适应性能。故此,提出了可能性分布距离度量下的一种鲁棒的域适应学习方法:首先,将传统MMD准则变换为新颖的可能性聚类模型来削弱噪声数据所带来的影响,以此构建一种鲁棒的可能性分布距离度量(P-DDM)准则,并通过引入模糊熵正则项来进一步提升领域分布配准的鲁棒有效性。其次,基于P-DDM准则,提出一种鲁棒的域适应视觉分类机(C-PDDM),其引入图拉普拉斯矩阵来保留源域与目标域内部数据间的几何结构一致性,以提升标签传播性能,同时通过最大化利用源域判别信息进行最小化领域判别误差,以进一步提升学习模型的泛化性能。理论分析证实,在一定条件下,所提P-DDM是传统分布距离度量方法MMD准则的一个上界,因而通过最小化P-DDM能有效优化MMD目标。最后,与几个代表性的领域适应学习方法进行比较,在6个视觉基准数据集(Office31、Office-Caltech、Office-Home、PIE、MNIST-UPS和COIL20)上的实验结果显示,该方法在泛化性能上平均提升了5%左右,在鲁棒性能上平均提升了10%左右。

关键词: 领域适应(DA), 可能性聚类, 最大均值差(MMD), 模糊熵

Abstract: Domain adaptation (DA) aims to solve the problem of inconsistent distribution between training dataset and test dataset, which has attracted extensive attention. Most of the existing DA methods solve this problem by the maximum mean discrepancy (MMD) criterion or its variants. However, the noise data may lead to a significant drift of domain mean, which will reduce the performance of MMD and its variants to some extent. To this end, this paper proposes a robust domain adaptation method with possibilistic distribution distance measure. Firstly, the traditional MMD criterion is transformed into a new possibilistic clustering model, which aims to reduce the impact from noise data. This paper constructs a robust possibilistic distribution distance measure (P-DDM) criterion. It further improves the robust effectiveness of domain distribution alignment by adding the fuzzy entropy regularization term. Secondly, a domain adaptation visual classifier based on P-DDM (C-PDDM) is proposed. It adopts a graphical Laplacian matrix for preserving the geometric consistency of data in source domain and target domain. It can improve the label propagation performance. In order to improve generalization, it maximizes the use of source domain discrimination information to minimize the domain discrimination error. Theoretical analysis confirms that the proposed P-DDM is an upper bound of the traditional distribution distance measurement method MMD criterion under certain conditions. Therefore, minimizing the P-DDM can effectively optimize the MMD objective. Finally,  it is compared with several representative domain adaptation methods, and the experimental results  on 6 visual benchmark datasets (Office31, Office-Caltech, Office-Home, PIE, MNIST-UPS, and COIL20) show that the proposed method achieves an average improvement of about 5% on generalization performance and an average improvement of about 10% on robustness performance.

Key words: domain adaptation (DA), probabilistic clustering, maximum mean discrepancy (MMD), fuzzy entropy