Journal of Frontiers of Computer Science and Technology ›› 2022, Vol. 16 ›› Issue (6): 1279-1290.DOI: 10.3778/j.issn.1673-9418.2111144

• Surveys and Frontiers • Previous Articles     Next Articles

Survey on Pseudo-Labeling Methods in Deep Semi-supervised Learning

LIU Yafen1,2, ZHENG Yifeng1,2,+(), JIANG Lingyi1,2, LI Guohe3, ZHANG Wenjie1,2   

  1. 1. College of Computer Science, Minnan Normal University, Zhangzhou, Fujian 363000, China
    2. Key Laboratory of Data Science and Intelligence Application, Fujian Province University, Zhangzhou, Fujian 363000, China
    3. College of Information Science and Engineering, China University of Petroleum, Beijing 102249, China
  • Received:2021-11-02 Revised:2022-01-05 Online:2022-06-01 Published:2022-01-17
  • About author:LIU Yafen, born in 1999, M.S. candidate, member of CCF. Her research interests include machine learning and deep learning.
    ZHENG Yifeng, born in 1980, Ph.D., lecturer, M.S. supervisor, member of CCF and CAAI. His research interests include artificial intelligence, machine learning and deep learning.
    JIANG Lingyi, born in 1998, M.S. candidate, member of CCF. Her research interests include machine learning and deep learning.
    LI Guohe, born in 1965, Ph.D., professor, Ph.D. supervisor. His research interests include artificial intelligence, machine learning and knowledge discovery.
    ZHANG Wenjie, born in 1984, Ph.D., professor, M.S. supervisor. His research interests include mobile edge computing and Internet of things.
  • Supported by:
    National Natural Science Foundation of China(62141602);Natural Science Foundation of Fujian Province(2021J011002);Natural Science Foundation of Fujian Province(2021J011004);Kalamay Science & Technology Research Project(2020CGZH0009)

深度半监督学习中伪标签方法综述

刘雅芬1,2, 郑艺峰1,2,+(), 江铃燚1,2, 李国和3, 张文杰1,2   

  1. 1. 闽南师范大学 计算机学院,福建 漳州 363000
    2. 数据科学与智能应用福建省高校重点实验室,福建 漳州 363000
    3. 中国石油大学(北京) 信息科学与工程学院,北京 102249
  • 通讯作者: + E-mail: zyf@mnnu.edu.cn
  • 作者简介:刘雅芬(1999—),女,福建南平人,硕士研究生,CCF会员,主要研究方向为机器学习、深度学习。
    郑艺峰(1980—),男,福建漳州人,博士,讲师,硕士生导师,CCF会员,CAAI会员,主要研究方向为人工智能、机器学习、深度学习。
    江铃燚(1998—),女,福建福州人,硕士研究生,CCF会员,主要研究方向为机器学习、深度学习。
    李国和(1965—),男,福建平和人,博士,教授,博士生导师,主要研究方向为人工智能、机器学习、知识发现。
    张文杰(1984—),男,福建漳州人,博士,教授,硕士生导师,主要研究方向为移动边缘计算、物联网。
  • 基金资助:
    国家自然科学基金(62141602);福建省自然科学基金(2021J011002);福建省自然科学基金(2021J011004);克拉玛依科技发展计划项目(2020CGZH0009)

Abstract:

With the development of intelligent technology, deep learning has become a hot topic in machine learning. It is playing a more and more important role in various fields. Deep learning requires a lot of labeled data to imp-rove model performance. Therefore, researchers effectively combine semi-supervised learning with deep learning to solve the labeled data problem. It utilizes a small amount of labeled data and a large amount of unlabeled data to build the model simultaneously. It can help to expand the sample space. In view of its theoretical significance and practical application value, this paper focuses on the pseudo-labeling methods as the starting point. Firstly, deep semi-supervised learning is introduced and the advantage of pseudo-labeling methods is pointed out. Secondly, the pseudo-labeling methods are described from self-training and multi-view training and the existing model is comprehensively analyzed. And then, the label propagation method based on graph and pseudo-labeling is introduced. Furthermore, the existing pseudo-labeling methods are analyzed and compared. Finally, the problems and future research direction of pseudo-labeling methods are summarized from the utility of unlabeled data, noise data, rationality, and the combi-nation of pseudo-labeling methods.

Key words: deep learning, semi-supervised learning, pseudo-labeling, label propagation

摘要:

随着智能技术的发展,深度学习已成为机器学习的研究热点,在各个领域发挥着越来越重要的作用。深度学习需要大量的标签数据用于提升模型性能。为了有效解决标签问题,研究人员将半监督学习与深度学习相结合。同时利用少量的标签数据和大量的无标签数据构建模型,有利于扩大样本空间。鉴于深度半监督学习的理论意义和实际应用价值,以深度半监督学习方法中的伪标签方法作为切入点进行分析。首先,对深度半监督学习进行介绍,指出伪标签方法优势所在;其次,从自训练和多视角训练角度出发对伪标签方法进行阐述,对已有的模型进行综合性分析;接着,重点介绍基于图和伪标签的标签传播方法,并对已有伪标签方法进行实验分析;最后,从无标签数据效用性、噪声数据、合理性和伪标签方法的结合上总结伪标签方法所面临的问题和未来研究方向。

关键词: 深度学习, 半监督学习, 伪标签, 标签传播

CLC Number: