自训练新类探测半监督学习算法

doi:10.3778/j.issn.1673-9418.2206059

摘要/Abstract

摘要： 传统的半监督学习算法（SSL）存在适用范围有限和泛化能力不足的缺陷，尤其是当训练数据集中出现未见标签的新类样本时，算法的性能将在很大程度上受到影响。基于人工标注的有标记样本获取方式需要领域专家的参与，消耗了高昂的时间和财力成本，且由于专家背景知识的局限，无法避免标记过程中的人为错标现象。为此，以提高对未见标签样本标注正确性为出发点的半监督学习算法具有迫切的实际需要。在对自训练算法进行了详细剖析之后，提出了一种有效的新类探测半监督学习算法（NCD-SSL）。首先，基于经典的极限学习机模型，构造了可处理标签增量和样本增量学习的通用增量极限学习机；然后，对自训练算法进行改进，利用标注可信度高的样本进行样本增量学习，同时设置了缓存池用以存储标注可信度低的样本；之后，使用聚类和分布一致性判定方法进行新类探测，进而实现类增量学习；最后，在仿真数据集和真实数据集上对提出算法的可行性和有效性进行了实验验证，实验结果显示在缺失类别数为3、2、1时，新算法的测试精度普遍比其他6种半监督学习算法高出30、20、10个百分点左右，从而证实了提出的算法能够获得更好的新类探测半监督学习表现。

关键词: 半监督学习（SSL）, 新类探测, 自训练, 极限学习机, 最大平均差异, 分布一致性

Abstract: The limited application scenario and unsatisfactory generalization capability are two main defects of traditional semi-supervised learning (SSL) algorithms. Especially, their prediction capabilities will be severely degraded when the training dataset includes the samples with new labels. It is usually time-consuming and expensive to label the unlabeled samples by the domain experts. In addition, the wrongly-labeled samples are unavoidable due to the insufficient background knowledge. Therefore, the SSL algorithms that can correctly label the unlabeled samples with unseen labels are urgent for practical applications. After analyzing the SSL algorithm in detail, an effective new class detection SSL (NCD-SSL) algorithm is proposed. Firstly, a universal incremental extreme learning machine is designed to deal with both class-incremental and sample-incremental classification problems. Secondly, the self-training model is improved by using the samples with high-confidence labels and setting a buffer pool to store the samples with low-confidence labels. Thirdly, the samples in buffer pool are further handled with clustering and distribution consistency judgement technologies so that the new classes can be detected. Finally, a series of persuasive experiments are conducted to validate the rationality and effectiveness of NCD-SSL algorithm on synthetic datasets and real datasests. Experimental results show that the testing accuracies of NCD-SSL algorithm are increased more than 30, 20 and 10 percentage points for 3-classes, 2-classes, 1-class missing cases in comparison with the other six popular SSL algorithms and thus demonstrate superior SSL performances of NCD-SSL algorithm.

Key words: semi-supervised learning (SSL), new class detection, self-training, extreme learning machine, maximum mean discrepancy, distribution consistency

何玉林, 陈佳琪, 黄启航, Philippe Fournier-Viger, 黄哲学. 自训练新类探测半监督学习算法[J]. 计算机科学与探索, 2023, 17(9): 2184-2197.

HE Yulin, CHEN Jiaqi, HUANG Qihang, Philippe Fournier-Viger, HUANG Zhexue. Self-training Semi-supervised Learning Algorithm for New Class Detection[J]. Journal of Frontiers of Computer Science and Technology, 2023, 17(9): 2184-2197.

参考文献

[1] YAROWSKY D. Unsupervised word sense disambiguation rivaling supervised methods[C]//Proceedings of the 33rd Annual Meeting on Association for Computational Linguistics, Cambridge, Jun 26-30, 1995. Stroudsburg: ACL, 1995: 189-196.
[2] TANHA J, VAN S M, AFSARMANESH H. Semi-supervised self-training for decision tree classifiers[J]. International Journal of Machine Learning & Cybernetics, 2017, 8(1): 355-370.
[3] HALDER A, GHOSH S, GHOSH A. Aggregation pheromone metaphor for semi-supervised classification[J]. Pattern Reco- gnition, 2013, 46(8): 2239-2248.
[4] WANG W, ZHOU Z H. A new analysis of co-training[C]//Proceedings of the 27th International Conference on Machine Learning, Haifa, Jun 21-24, 2010: 1135-1142.
[5] BLUM A, MITCHELL T. Combining labeled and unlabeled data with co-training[C]//Proceedings of the 11th Annual Conference on Computational Learning Theory, Madison, Jul 24-26, 1998. New York: ACM, 1998: 92-100.
[6] ZHOU Z H, LI M. Tri-training: exploiting unlabeled data using three classifiers[J]. IEEE Transactions on Knowledge and Data Engineering, 2005, 17(11): 1529-1541.
[7] YODER J, PRIEBE C E. Semi-supervised k-means++[J]. Journal of Statistical Computation and Simulation, 2017, 87(13): 2597-2608.
[8] ZHU X J. Semi-supervised learning literature survey[R]. Madison: University of Wisconsin-Madison, 2008.
[9] 杜红乐, 滕少华, 张燕. 协同标注的直推式支持向量机算法[J]. 小型微型计算机系统, 2016, 37(11): 2443-2447.
DU H L, TENG S H, ZHANG Y. Transductive support vector machine based on cooperative labeling[J]. Journal of Chinese Computer Systems, 2016, 37(11): 2443-2447.
[10] CEVIKALP H, FRANC V. Large-scale robust transductive support vector machines[J]. Neurocomputing, 2017, 235(1): 199-209.
[11] LUO Y, ZHU J, LI M, et al. Smooth neighbors on teacher graphs for semi-supervised learning[C]//Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, Jun 18-21, 2018. Washington: IEEE Computer Society, 2018: 8896-8905.
[12] VERMA V, KAWAGUCHI K, LAMB A, et al. Interpolation consistency training for semi-supervised learning[J]. Neural Networks, 2022, 145: 90-106.
[13] BERTHELOT D, CARLINI N, GOODFELLOW I, et al. MixMatch: a holistic approach to semi-supervised learning[C]//Advances in Neural Information Processing Systems 32, Vancouver, Dec 8-14, 2019: 5050-5060.
[14] SOHN K, BERTHELOT D, CARLINI N, et al. FixMatch: simplifying semi-supervised learning with consistency and confidence[C]//Advances in Neural Information Processing Systems 33, Dec 6-12, 2020: 596-608.
[15] YANG Y, XU Z. Rethinking the value of labels for improving class-imbalanced learning[C]//Advances in Neural Information Processing Systems 33, Dec 6-12, 2020: 19290-19301.
[16] KIM J, HUR Y, PARK S, et al. Distribution aligning refinery of pseudo-label for imbalanced semi-supervised learning[C]//Advances in Neural Information Processing Systems 33, Dec 6-12, 2020: 14567-14579.
[17] CHEN Y, ZHU X, LI W, et al. Semi-supervised learning under class distribution mismatch[J]. The Association for the Advance of Artificial Intelligence, 2020, 34(4): 3569-3576.
[18] HAN H, MA W, ZHOU M, et al. A novel semi-supervised learning approach to pedestrian reidentification[J]. IEEE Internet of Things Journal, 2020, 8(4): 3042-3052.
[19] XU Y, SHANG L, YE J, et al. Dash: semi-supervised learning with dynamic thresholding[C]//Proceedings of the 38th International Conference on Machine Learning, Jul 18-24, 2021: 11525-11536.
[20] ZHANG B, WANG Y, HOU W, et al. FlexMatch: boosting semi-supervised learning with curriculum pseudo labeling[C]//Advances in Neural Information Processing Systems 34, Dec 6-14, 2021: 18408-18419.
[21] FENG Z, ZHOU Q, GU Q, et al. DMT: dynamic mutual training for semi-supervised learning[J]. Pattern Recognition, 2022, 130: 108777.
[22] VAN ENGELEN J E, HOOS H H. A survey on semi-supervised learning[J]. Machine Learning, 2020, 109(2): 373-440.
[23] CHONG Y, DING Y, YAN Q, et al. Graph-based semi-supervised learning: a review[J]. Neurocomputing, 2020, 408: 216-230.
[24] HUANG G B, ZHU Q Y, SIEW C K. Extreme learning machine: theory and applications[J]. Neurocomputing, 2006, 70(1): 489-501.
[25] HUANG G B, ZHOU H, DING X, et al. Extreme learning machine for regression and multiclass classification[J]. IEEE Transactions on Systems, Man, and Cybernetics, Part B(Cybernetics), 2012, 42(2): 513-529.
[26] SCHMIDT W F, KRAAIJVELD M A, DUIN R P W. Feedforward neural networks with random weights[C]//Proceedings of the 11th IAPR International Conference on Pattern Recognition Methodology and Systems, Hague, Aug 30-Sep 3, 1992. Piscataway: IEEE, 1992: 1-4.
[27] ZHAO J, WANG Z, CAO F, et al. A local learning algorithm for random weights networks[J]. Knowledge-Based Systems, 2015, 74(1): 159-166.
[28] CAO F, WANG D, ZHU H, et al. An iterative learning algorithm for feedforward neural networks with random weights[J]. Information Sciences, 2016, 328: 546-557.
[29] MOORE E H. On the reciprocal of the general algebraic matrix(abstract)[J]. Bulletin of the American Mathematical Society, 1920, 26: 394-395.
[30] JOE H. Estimation of entropy and other functionals of a multivariate density[J]. Annals of the Institute of Statistical Mathematics, 1989, 41(1): 683-697.
[31] GERTTON A, BORGWARDT K M, RASCH M J, et al. A kernel two-sample test[J]. Journal of Machine Learning Research, 2012, 13(1): 723-773.
[32] HUANG G B, CHEN L, SIEW C K. Universal approxima-tion using incremental constructive feedforward networks with random hidden nodes[J]. IEEE Transactions on Neural Networks, 2006, 17(4): 879-892.