计算机科学与探索 ›› 2016, Vol. 10 ›› Issue (4): 543-553.DOI: 10.3778/j.issn.1673-9418.1505064

• 人工智能与模式识别 • 上一篇    下一篇

基于图谱的多标记特征选择算法

严  鹏,李  云+   

  1. 南京邮电大学 计算机学院,南京 210023
  • 出版日期:2016-04-01 发布日期:2016-04-01

Spectral Theory Based Multi-Label Feature Selection

YAN Peng, LI Yun+   

  1. School of Computer Science, Nanjing University of Posts and Telecommunications, Nanjing 210023, China
  • Online:2016-04-01 Published:2016-04-01

摘要: 特征选择在传统的单标记问题中已经得到深入的研究,但是大多数传统的特征选择算法却无法用于多标记问题。这是因为多标记问题中的每一个数据样本都同时与多个类标相关联,此时需要设计新的指标来评价特征。并且由于多个类标之间通常存在一定的关联性,在设计特征选择算法时还需要对类标的结构进行建模以利用类标的关联信息。采用谱特征选择(spectral feature selection,SPEC)框架解决上述问题。SPEC所需的相似性矩阵和图结构由样本类标的Jaccard相似性来构建,它能反映类标间的关联性。此外,所提出的方法属于过滤器模型,它独立于分类算法且不需要将多标记问题转化为单标记问题来处理。在现实世界数据集上的实验验证了所提出算法的正确性和较好的性能。

关键词: 多标记学习, 谱特征选择, 标记关联性

Abstract: Feature selection has been deeply studied in traditional single label problem. When it comes to multi-label problem, most of traditional feature selection algorithms for single label problem are not able to be applied directly, since instances in multi-label problems are associated with several labels simultaneously, new criteria to evaluate features are needed. Because of the correlations among several labels, new methods to model labels structure are needed for using the correlation information when designing feature selection algorithm. This paper uses the spectral feature selection (SPEC) framework to handle multi-label problem and uses Jaccard similarity to construct the similarity matrix and the target graph in SPEC for multi-label problem. This method is under filter model, which is different from the wrapper model. The latter always transforms the multi-label problem into some single label problems, and traditional feature selection algorithm is applied to these single label problems. The experiments on real world data sets demonstrate the correctness and high performance of the proposed algorithm.

Key words: multi-label learning, spectral feature selection, label correlation