计算机科学与探索 ›› 2020, Vol. 14 ›› Issue (3): 470-481.DOI: 10.3778/j.issn.1673-9418.1903053

• 人工智能 • 上一篇    下一篇

特征标记依赖自编码器的多标记特征提取方法

程玉胜,李志伟,庞淑芳   

  1. 1.安庆师范大学 计算机与信息学院,安徽 安庆 246011
    2.安徽省高校智能感知与计算重点实验室,安徽 安庆 246011
  • 出版日期:2020-03-01 发布日期:2020-03-13

Multi-Label Feature Extraction Method Relied on Feature-Label Dependence Auto-encoder

CHENG Yusheng, LI Zhiwei, PANG Shufang   

  1. 1.School of Computer and Information, Anqing Normal University, Anqing, Anhui 246011, China
    2.The University Key Laboratory of Intelligent Perception and Computing of Anhui Province, Anqing, Anhui 246011, China
  • Online:2020-03-01 Published:2020-03-13

摘要:

在多标记学习中,如何处理高维特征一直是研究难点之一,而特征提取算法可以有效解决数据特征高维性导致的分类性能降低问题。但目前已有的多标记特征提取算法很少充分利用特征信息并充分提取“特征-标记”独立信息及融合信息。基于此,提出一种基于特征标记依赖自编码器的多标记特征提取方法。使用核极限学习机自编码器将原标记空间与原特征空间融合并产生重构后的新特征空间。一方面最大化希尔伯特-施密特范数以充分利用标记信息;另一方面通过主成分分析来降低特征提取过程中的信息损失,结合二者并分别提取“特征-特征”和“特征-标记”信息。通过在Yahoo多组高维多标记数据集上的对比实验表明,该算法的性能优于当前五种主要的多标记特征提取方法,验证了所提算法的有效性。

关键词: 多标记特征提取, 特征标记依赖度, 核极限学习机, 主成分分析, 自编码器

Abstract:

In multi-label learning, how to deal with high-dimensional features has always been one of the research difficulties. The feature extraction algorithm can effectively solve the problem of classification performance degra-dation caused by high dimensionality of data features. However, the existing multi-label feature extraction algo-rithms rarely make full use of feature information and fully extract the “feature-label” independent information and fusion information. Based on this, a multi-label feature extraction method based on feature-label dependence auto-encoder is proposed. The kernel extreme learning machine self-encoder is used to fuse the label space with the ori-ginal feature space and generate the reconstructed feature space. On the one hand, Hilbert-Schmidt independence cri-terion is maximized to make full use of the information between labels and the features; on the other hand, principal component analysis is used to reduce the information loss in the process of feature extraction. These?two?aspects are combined and the information of “feature-feature” and “feature-label” is extracted respectively. The comparison experi-ments on Yahoo high-dimensional multi-label datasets show that the performance of this algorithm is better than the current five main multi-label feature extraction methods, and the effectiveness of the proposed algorithm is verified.

Key words: multi-label feature extraction, feature-label dependence, kernel extreme learning machine, principal component analysis, autoencoder