Journal of Frontiers of Computer Science and Technology ›› 2022, Vol. 16 ›› Issue (1): 144-152.DOI: 10.3778/j.issn.1673-9418.2008038

• Artificial Intelligence • Previous Articles     Next Articles

Improved Two-View Random Forest

XIA Xiaoqiu1, CHEN Songcan1,+()   

  1. 1. College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing 210016, China
    2. MIIT Key Laboratory of Pattern Analysis and Machine Intelligence, Nanjing University of Aeronautics and Astronautics, Nanjing 210016, China
  • Received:2020-08-12 Revised:2020-12-02 Online:2022-01-01 Published:2020-12-08
  • About author:XIA Xiaoqiu, born in 1996, M.S. candidate. Her research interests include pattern recognition and machine learning.
    CHEN Songcan, born in 1962, professor, fellow of CAAI and IAPR. His research interests in-clude pattern recognition and machine learning.
  • Supported by:
    National Natural Science Foundation of China(61672281);National Natural Science Foundation of China(61732006)

改进的二视图随机森林

夏笑秋1, 陈松灿1,+()   

  1. 1.南京航空航天大学 计算机科学与技术学院,南京 210016
    2.南京航空航天大学 模式分析与机器智能工信部重点实验室,南京 210016
  • 通讯作者: + E-mail: s.chen@nuaa.edu.cn
  • 作者简介:夏笑秋(1996—),女,硕士研究生,主要研究方向为模式识别、机器学习。
    陈松灿(1962—),男,教授,CAAI会士,IAPR会士,主要研究方向为模式识别、机器学习。
  • 基金资助:
    国家自然科学基金(61672281);国家自然科学基金(61732006)

Abstract:

Random forest (RF) is one of the most classic machine learning methods, which has been widely used. However, although there are many two-view data in reality and extensive analytical research has been carried out, the RF construction for two-view scenarios is little. The only RF method for two-view learning first generates RF for each view respectively, and then merges the view information when making decisions. Therefore, it turns out an obvious disadvantage that the correlation between views is not utilized effectively during the RF construction stage, which undoubtedly wastes information resources. In order to make up for this disadvantage, an improved two-view RF (ITVRF) is proposed in this paper. Specifically, canonical correlation analysis (CCA) is used for view fusion in the process of generating decision trees, and the information interaction between views is embedded into the tree construction stage, realizing the utilization of complementary information between views in the entire RF generation process. In addition, ITVRF also generates discriminant decision boundaries for decision trees through discriminant analysis and thus makes it more suitable for classification. Experimental results show that ITVRF achieves better accuracy than existing two-view RF (TVRF).

Key words: decision tree, random forest (RF), two-view learning, canonical correlation analysis (CCA)

摘要:

随机森林(RF)是最经典的机器学习算法之一,并已获得广泛应用。然而观察发现,尽管现实中存在众多的二视图数据并已获得广泛的分析研究,但针对二视图场景的RF构建相当少,仅有的利用RF解决二视图学习问题的方法也都是先为各个视图生成各自的RF,在决策时才融合了视图间的信息。这样的方法存在一个显著不足是在其RF的构建阶段未利用两个视图间的相关性,这无疑浪费了信息资源。为了弥补这一不足,提出了一种改进的二视图随机森林(ITVRF)。具体而言,在决策树的生成过程中采用典型相关分析(CCA)进行视图融合,将视图间的信息交互融入到了决策树的构建阶段,实现了视图间互补信息在整个RF生成过程中的利用。此外,ITVRF还通过判别分析为决策树生成判别决策边界,更适合于分类。实验结果表明ITVRF比现有的二视图RF(TVRF)有着更优的准确率。

关键词: 决策树, 随机森林(RF), 二视图学习, 典型相关分析(CCA)

CLC Number: