Journal of Frontiers of Computer Science and Technology ›› 2018, Vol. 12 ›› Issue (9): 1444-1453.DOI: 10.3778/j.issn.1673-9418.1705037

Previous Articles     Next Articles

Ternary Error-Correcting Output Codes Based Partial Label Learning Algorithm

ZHOU Binbin1,2, ZHANG Minling1,2+, LIU Xuying1,2   

  1. 1. School of Computer Science and Engineering, Southeast University, Nanjing 210096, China
    2. Key Laboratory of Computer Network and Information Integration, Ministry of Education, Southeast University, Nanjing 210096, China
  • Online:2018-09-01 Published:2018-09-10

基于三元纠错输出编码的偏标记学习算法

周斌斌1,2,张敏灵1,2+,刘胥影1,2   

  1. 1. 东南大学 计算机科学与工程学院,南京 210096
    2. 东南大学 计算机网络和信息集成教育部重点实验室,南京 210096

Abstract: Partial label learning is an important weakly supervised learning framework. In partial label learning, each training example is associated with a set of candidate labels, among which only one is valid. Obviously, the more the candidate labels, the greater the difficulty of partial label learning. In order to decrease the number of candidate labels to reduce the difficulty of partial label learning, this paper proposes a ternary error-correcting output codes based partial label learning algorithm (PL-TECOC) which transforms a partial label learning problem into a series of binary class learning problems and integrates these binary classifiers finally. PL-TECOC utilizes “0” coding to ignore the corresponding label and constructs the positive and negative classes only based on non-zero coding in the construction process of binary class training data. Compared with several popular partial label learning algorithms, experimental results on artificial and real-world datasets show good performance of PL-TECOC.

Key words: weakly supervised learning, disambiguation, error-correcting output codes, partial label learning

摘要: 偏标记学习是一类重要的弱监督学习框架,在该框架下,每个训练样本与一组候选标记相关联,在候选标记集合中有且仅有一个是其真实标记。很明显,候选标记数目越多,偏标记学习难度越大。为了减少候选标记数目以降低偏标记学习难度,提出了一种基于三元纠错输出码的偏标记学习算法(PL-TECOC),该算法将偏标记学习问题转换为多个二类学习问题,并对学到的多个二类分类器进行最终集成。在构建二类训练数据时采用编码“0”来忽略相应标记,仅依据非“0”编码标记进行正负类的构造,以达到减少候选标记数目的目的。实验表明,与多个流行的偏标记学习算法相比,PL-TECOC在人工数据集和真实数据集上均取得了较好的分类性能。

关键词: 弱监督学习, 消歧, 纠错输出编码, 偏标记学习