计算机科学与探索 ›› 2021, Vol. 15 ›› Issue (4): 754-765.DOI: 10.3778/j.issn.1673-9418.2005041

• 理论与算法 • 上一篇    下一篇

混合局部因果结构学习

王雲霞,曹付元,凌兆龙   

  1. 1. 山西大学 计算机与信息技术学院,太原 030006
    2. 山西大学 计算智能与中文信息处理教育部重点实验室,太原 030006
    3. 安徽大学 计算机科学与技术学院,合肥 230601
  • 出版日期:2021-04-01 发布日期:2021-04-02

Hybrid Local Causal Structure Learning

WANG Yunxia, CAO Fuyuan, LING Zhaolong   

  1. 1. School of Computer and Information Technology, Shanxi University, Taiyuan 030006, China
    2. Key Laboratory of Computational Intelligence and Chinese Information Processing of Ministry of Education, Shanxi University, Taiyuan 030006, China
    3. School of Computer Science and Technology, Anhui University, Hefei 230601, China
  • Online:2021-04-01 Published:2021-04-02

摘要:

局部因果结构学习是发现和学习给定一个目标变量的直接原因和直接结果而无需学习一个完整因果网络的过程。目前已有算法通常由两个步骤完成:步骤1使用约束类算法利用独立性测试学习目标变量的马尔科夫毯(MB)或父子节点集(PC),但是该步骤由于受到有限的数据样本量等因素影响使得独立性测试存在一定的错误性,而导致该步骤精度通常不是很高;步骤2利用V结构及Meek规则来进行边的定向,但是该步骤由于极其依赖于V结构的发现且同样受到有限样本的影响,使得算法精度相对不是很高。基于上述问题,提出利用打分和限制相结合的混合方式来缓减有限样本问题且提高算法精度。步骤1通过在基于限制的算法中加入打分思想来提高数据有效性,进而提出SIAPC算法;步骤2通过利用PC算法得到的定向结果和对部分数据集打分得到的定向结果的交集来确定边的方向,以此来降低对V结构的依赖性且缓减有限样本问题,之后使用独立性测试修正边的定向结果来进一步提高算法精度,进而提出HLCS算法。在标准贝叶斯网络上,实验验证了该算法相对于已有算法在精度方面具有更好的性能且能够有效缓减数据效率问题。

关键词: 贝叶斯网络, 局部因果结构学习, 马尔科夫毯(MB), V结构

Abstract:

Local causal structure learning focuses on identifying the direct causes and direct effects of a given target variable without learning an entire causal network. Existing local causal structure learning algorithms are usually completed by two steps. Step 1 uses constraint-based methods to learn the Markov blanket (MB) or parents and children (PC) set of the target variable by conditional independence tests. However, due to small sample sizes, this may lead to unreliable tests and the accuracy of this step is usually not very high. Step 2 uses found V structures and Meek rules for distinguishing direct causes from direct effects of the target variable. But this step depends on the discovery of V structure extremely and synchronous sampling is also affected by limited samples. The accuracy of the algorithm is not very high. To solve the above problems, this paper proposes a hybrid local causal structure learning algorithm based on the combination of scoring and constraint. In step 1, a new PC learning algorithm SIAPC (score-based incremental association parents and children) is proposed by adding scoring idea into the constraint based algorithm. In step 2, the direction of the edge is determined by using the intersection of the orientation result obtained by PC algorithm and the orientation result obtained by grading some data sets, so as to reduce the dependence on V structure and alleviate the finite sample problem. After that, this paper uses independence test to modify the orientation results of the edges to further improve the accuracy of the algorithm, and then proposes HLCS (hybrid local causal structure learning) algorithm. Using benchmark Bayesian networks, the experimental results show that the algorithm proposed in this paper has better performance than the existing algorithms in terms of learning accuracy and reducing the data efficiency.

Key words: Bayesian network, local causal structure learning, Markov blanket (MB), V-structure