计算机科学与探索 ›› 2019, Vol. 13 ›› Issue (1): 158-168.DOI: 10.3778/j.issn.1673-9418.1806020

• 理论与算法 • 上一篇    下一篇

面向函数型数据的动态互信息特征选择方法

马  忱1,姜高霞1,王文剑2+   

  1. 1. 山西大学 计算机与信息技术学院,太原 030006
    2. 山西大学 计算智能与中文信息处理教育部重点实验室,太原 030006
  • 出版日期:2019-01-01 发布日期:2019-01-09

Dynamic Mutual Information Feature Selection for Functional Data

MA Chen1, JIANG Gaoxia1, WANG Wenjian2+   

  1. 1. School of Computer and Information Technology, Shanxi University, Taiyuan 030006, China
    2. Key Laboratory of Computational Intelligence and Chinese Information Processing of Ministry of Education, Shanxi University, Taiyuan 030006, China
  • Online:2019-01-01 Published:2019-01-09

摘要: 函数型数据将观测到的数据作为一个整体,关注数据自身的内在结构而不只是数据的呈现形式,相较于传统的数据包含了更多的信息,因此对函数型数据的分析和研究具有重要的价值。在函数型数据分析中,特征选择也是一个需要解决的问题。提出了一种面向函数型数据的动态互信息(dynamic mutual information,DMI)特征选择方法,充分考虑数据的内在特征,运用互信息将特征进行排序和动态选择,不仅可以获得稳定的特征子集,而且充分考虑了样本在特征选择中的作用,较好地避免了信息的冗余。进一步提出了一种动态条件互信息(dynamic conditional mutual information,DCMI)特征选择方法,在动态特征选择的过程中,考虑到已选特征会对后续的特征选择产生影响,引入条件互信息,将已选特征对待选特征的影响进行量化表示,更恰当地描述特征与特征集合之间的关系。在UCR数据集上的实验结果表明,DMI方法和DCMI方法进行特征选择得到的特征子集规模小且分类精度高。

关键词: 函数型数据, 特征选择, 互信息, 动态互信息, 动态条件互信息

Abstract: Functional data are concerned with the observed data as a whole, and focus on the intrinsic structure rather than the presentation form of the data. Compared with the traditional data, functional data contain more information, so the analysis and research on functional data have great value. In functional data analysis, feature selection is also an important issue. This paper proposes the dynamic mutual information (DMI) feature selection method for functional data by sorting features and selecting features dynamically by applying the mutual information theory, which takes account of the intrinsic data features. A stable subset of features can not only be obtained, but also the role of samples in feature selection is fully considered. Thus, the redundancy of information can be avoided. Besides, this paper proposes the dynamic conditional mutual information (DCMI) feature selection method by considering the influence of selected features on candidate features in dynamical feature selection process. The measure of the impact is based on conditional mutual information theory, and the relationship between features and feature subset can be descripted more appropriately. The experimental results on the UCR datasets show that the proposed DMI and DCMI can achieve small feature subset and good classification accuracy.

Key words: functional data, feature selection, mutual information, dynamic mutual information (DMI), dynamic conditional mutual information (DCMI)