计算机科学与探索 ›› 2012, Vol. 6 ›› Issue (8): 726-740.DOI: 10.3778/j.issn.1673-9418.2012.08.006

• 学术研究 • 上一篇    下一篇

基于可达概率区间的不确定决策树

陈红梅1,王丽珍1+,刘惟一1,袁立坚2   

  1. 1. 云南大学 信息学院 计算机科学与工程系,昆明 650091
    2. 昆明冶金高等专科学校 图书馆,昆明 650033
  • 出版日期:2012-08-01 发布日期:2012-08-06

Decision Tree for Uncertain Data Based on Reachable Probability Intervals

CHEN Hongmei1, WANG Lizhen1+, LIU Weiyi1, YUAN Lijian2   

  1. 1. Department of Computer Science and Engineering, School of Information Science and Engineering, Yunnan University, Kunming 650091, China
    2. Library, Kunming Metallurgy College, Kunming 650033, China
  • Online:2012-08-01 Published:2012-08-06

摘要: 针对不确定数据的概率分布难以获取的客观实际,讨论了缺失概率分布的值不确定离散对象的决策树。定义了(条件)概率区间,并证明了(条件)概率区间是可达概率区间;基于可达概率区间,定义了(条件)熵区间,并给出了求解(条件)熵区间的上/下界的方法;采用条件熵区间作为属性选择度量,提出了一种新的不确定决策树,将以0-1划分对象的决策树扩展到以概率区间分配对象的决策树,这样不仅可以处理缺失概率分布的值不确定离散对象,也可以处理确定离散对象。通过在基于UCI数据集的不确定数据集上的实验,证实了不确定决策树是有效的。

关键词: 缺失概率分布的值不确定离散对象, 决策树, 可达概率区间, 条件熵区间

Abstract: This paper studies a decision tree for value-uncertain discrete objects missing probabilities, because it is difficult to obtain the probability distributions over uncertain data in applications. Firstly, the paper defines the (conditional) probability intervals, and proves that the (conditional) probability intervals are the reachable probability intervals. Secondly, based on the reachable probability intervals, it defines the (conditional) entropy intervals, and gives a method to compute the upper and the lower bounds of the (conditional) entropy intervals. Finally, it presents a new decision tree for uncertain data, in which the conditional entropy intervals are used to select the best attributes and objects are assigned to the branches with probability intervals. The decision tree can handle both value-uncertain discrete objects missing probabilities and certain discrete objects. Experiments with uncertain datasets based on UCI datasets show the satisfactory performance.

Key words: value-uncertain discrete objects missing probabilities, decision tree, reachable probability interval, conditional entropy interval