计算机科学与探索 ›› 2019, Vol. 13 ›› Issue (4): 639-646.DOI: 10.3778/j.issn.1673-9418.1806007

• 人工智能与模式识别 • 上一篇    下一篇

近似多元信息多样性

孙  涛,周志华+   

  1. 南京大学 计算机软件新技术国家重点实验室,南京 210023
  • 出版日期:2019-04-01 发布日期:2019-04-10

Approximate Multi-Information Diversity

SUN Tao, ZHOU Zhihua+   

  1. National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing 210023, China
  • Online:2019-04-01 Published:2019-04-10

摘要: 集成多样性,即个体学习器之间的差异性,是集成学习中的一个基础问题。多元信息多样性(multi-information diversity)基于信息论来刻画集成多样性,为理解集成多样性提供了一个可行方向,其在实际应用中面临的困难是高阶信息通常难以估计。提出基于一种特殊的[k]阶t-cherry联结树对高阶信息做低阶近似,从而得到多元信息多样性的近似估计。方法包括基于联结树直接近似估计多元信息和近似估计多元信息分量,并对两者的相关性进行了分析。实验结果表明,在同阶近似下,该估计方法优于现有近似估计方法。

关键词: 机器学习, 集成多样性, 信息论, 联结树

Abstract: Ensemble diversity, that is, the difference among the individual learners, is a fundamental issue in ensemble learning. Multi-information diversity measures ensemble diversity based on information theory and provides a promising direction for understanding ensemble diversity. In practice, the high-order information is usually hard to be estimated. This paper proposes to estimate high-order information based on a particular kind of k-order t-cherry junction tree so as to get approximate multi-information diversity. The method includes approximating multi-information directly as well as approximating its mathematical components by the junction tree, and the relevance between the two cases is discussed. Experiments show that this estimation method is superior to established ones under the same low-order approximation.

Key words: machine learning, ensemble diversity, information theory, junction tree