Journal of Frontiers of Computer Science and Technology ›› 2019, Vol. 13 ›› Issue (10): 1768-1780.DOI: 10.3778/j.issn.1673-9418.1807067

Previous Articles     Next Articles

Feature Selection Algorithm in Multi-Label Incomplete Data

QIAN Wenbin, HUANG Qin, WANG Yinglong, YANG Jun   

  1. 1. School of Computer and Information Engineering, Jiangxi Agricultural University, Nanchang 330045, China
    2. School of Software, Jiangxi Agricultural University, Nanchang 330045, China
  • Online:2019-10-01 Published:2019-10-15



  1. 1. 江西农业大学 计算机与信息工程学院,南昌 330045
    2. 江西农业大学 软件学院,南昌 330045

Abstract: The feature selection of multi-label data is considered as an important research issue in machine learning and data mining. At present, most feature selection works of multi-label deal with the complete data. However, in many applications, the data are continuous, because of the high diagnosis cost, privacy protection or other factors, resulting in the incompleteness. To address this issue, a feature selection algorithm in multi-label incomplete data is proposed. Neighborhood rough set model is applied for feature selection in multi-label incomplete data, and then the neighborhood granularities of the multi-label incomplete data are computed by the neighborhood threshold. The criterion of feature significance is developed based on neighborhood granularities. On this basis, the feature selection algorithm is designed for multi-label incomplete data. Finally, the effectiveness and feasibility of the proposed algorithm are verified by the experimental results on the Mulan dataset.

Key words: incomplete data, rough sets, feature selection, attribute reduction

摘要: 多标记数据的特征选择是机器学习和数据挖掘领域的重要研究内容,当前对于多标记数据的特征选择研究大多是针对完备性数据,但在许多应用领域中,连续型数值数据较多,且由于诊测成本和隐私保护等因素导致数据往往呈现出不完备性。为解决上述问题,提出了一种面向多标记不完备数据的特征选择算法。该算法将邻域粗糙集模型应用于多标记不完备数据的特征选择,根据邻域阈值求解多标记不完备数据的邻域粒度,并基于邻域粒度给出了度量多标记不完备数据的特征重要性准则,以此设计了面向多标记不完备数据的特征选择算法。最后,通过在Mulan数据集上的实验结果验证了算法的有效性和可行性。

关键词: 不完备数据, 粗糙集, 特征选择, 属性约简