Journal of Frontiers of Computer Science and Technology ›› 2023, Vol. 17 ›› Issue (12): 2928-2941.DOI: 10.3778/j.issn.1673-9418.2308062

• Theory·Algorithm • Previous Articles     Next Articles

Incremental Feature Selection Oriented for Data with Hierarchical Structure

SHE Yanhong, HUANG Wanli, HE Xiaoli, QIAN Ting   

  1. 1. College of Science, Xi’an Shiyou University, Xi’an 710065, China
    2. College of Computer, Xi’an Shiyou University, Xi’an 710065, China
  • Online:2023-12-01 Published:2023-12-01

面向层次结构数据的增量特征选择

折延宏,黄婉丽,贺晓丽,钱婷   

  1. 1. 西安石油大学 理学院,西安 710065
    2. 西安石油大学 计算机学院,西安 710065

Abstract: In the big data era, the sample size is becoming increasingly large, the data dimensionality is also becoming extremely high, moreover, there exists hierarchical structure between different class labels. This paper investigates incremental feature selection for hierarchical classification based on the dependency degree of inclusive strategy and solves the hierarchical classification problem where labels are distributed at arbitrary nodes in tree structure. Firstly, the inclusive strategy is used to reduce the negative sample space by exploiting the hierarchical label structure. Secondly, a new fuzzy rough set model is introduced based on inclusive strategy, and a dependency calculation algorithm based on the inclusive strategy and a non-incremental feature selection algorithm are also proposed. Then, the dependency degree based on the inclusive strategy is proposed by adopting the incremental mechanism. Based on these, two incremental feature selection frameworks based on two strategies are designed. Lastly, a comparative study with the method based on the sibling strategy is performed. The?feasibility?and?efficiency?of the proposed algorithms are verified by numerical experiments.

Key words: fuzzy rough sets, dependency degree, hierarchical classification, incremental feature selection, inclusive strategy

摘要: 随着大数据时代的到来,数据样本量越来越多,维度越来越高,同时样本标签存在复杂的层次结构关系。采用包含策略,研究了基于依赖度的分层分类增量特征选择,解决了标签具有树结构且标签分布在任意节点的分层分类问题。首先,利用标签之间的层次结构,采用包含策略来缩小负样本空间。其次,使用模糊粗糙集理论,提出了一个基于包含策略的模糊粗糙集模型,设计了一个基于包含策略的依赖度计算算法和一个非增量特征选择算法。基于此,引入增量机制,提出了基于包含策略的依赖度增量更新方法,设计了两个基于两种策略的增量特征选择算法。最后,将此方法与基于兄弟策略的依赖度进行对比,通过实验验证了所提方法的可行性与高效性。

关键词: 模糊粗糙集, 依赖度, 分层分类, 增量特征选择, 包含策略