改进的二视图随机森林

doi:10.3778/j.issn.1673-9418.2008038

计算机科学与探索 ›› 2022, Vol. 16 ›› Issue (1): 144-152.DOI: 10.3778/j.issn.1673-9418.2008038

改进的二视图随机森林

夏笑秋¹, 陈松灿¹^,⁺()

1.南京航空航天大学计算机科学与技术学院,南京 210016
2.南京航空航天大学模式分析与机器智能工信部重点实验室,南京 210016

收稿日期:2020-08-12 修回日期:2020-12-02 出版日期:2022-01-01 发布日期:2020-12-08
通讯作者: + E-mail: s.chen@nuaa.edu.cn
作者简介:夏笑秋（1996—）,女,硕士研究生,主要研究方向为模式识别、机器学习。
陈松灿（1962—）,男,教授,CAAI会士,IAPR会士,主要研究方向为模式识别、机器学习。
基金资助:
国家自然科学基金(61672281);国家自然科学基金(61732006)

Improved Two-View Random Forest

XIA Xiaoqiu¹, CHEN Songcan¹^,⁺()

1. College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing 210016, China
2. MIIT Key Laboratory of Pattern Analysis and Machine Intelligence, Nanjing University of Aeronautics and Astronautics, Nanjing 210016, China

Received:2020-08-12 Revised:2020-12-02 Online:2022-01-01 Published:2020-12-08
About author:XIA Xiaoqiu, born in 1996, M.S. candidate. Her research interests include pattern recognition and machine learning.
CHEN Songcan, born in 1962, professor, fellow of CAAI and IAPR. His research interests in-clude pattern recognition and machine learning.
Supported by:
National Natural Science Foundation of China(61672281);National Natural Science Foundation of China(61732006)

摘要/Abstract

摘要：

随机森林（RF）是最经典的机器学习算法之一,并已获得广泛应用。然而观察发现,尽管现实中存在众多的二视图数据并已获得广泛的分析研究,但针对二视图场景的RF构建相当少,仅有的利用RF解决二视图学习问题的方法也都是先为各个视图生成各自的RF,在决策时才融合了视图间的信息。这样的方法存在一个显著不足是在其RF的构建阶段未利用两个视图间的相关性,这无疑浪费了信息资源。为了弥补这一不足,提出了一种改进的二视图随机森林（ITVRF）。具体而言,在决策树的生成过程中采用典型相关分析（CCA）进行视图融合,将视图间的信息交互融入到了决策树的构建阶段,实现了视图间互补信息在整个RF生成过程中的利用。此外,ITVRF还通过判别分析为决策树生成判别决策边界,更适合于分类。实验结果表明ITVRF比现有的二视图RF（TVRF）有着更优的准确率。

关键词: 决策树, 随机森林（RF）, 二视图学习, 典型相关分析（CCA）

Abstract:

Random forest (RF) is one of the most classic machine learning methods, which has been widely used. However, although there are many two-view data in reality and extensive analytical research has been carried out, the RF construction for two-view scenarios is little. The only RF method for two-view learning first generates RF for each view respectively, and then merges the view information when making decisions. Therefore, it turns out an obvious disadvantage that the correlation between views is not utilized effectively during the RF construction stage, which undoubtedly wastes information resources. In order to make up for this disadvantage, an improved two-view RF (ITVRF) is proposed in this paper. Specifically, canonical correlation analysis (CCA) is used for view fusion in the process of generating decision trees, and the information interaction between views is embedded into the tree construction stage, realizing the utilization of complementary information between views in the entire RF generation process. In addition, ITVRF also generates discriminant decision boundaries for decision trees through discriminant analysis and thus makes it more suitable for classification. Experimental results show that ITVRF achieves better accuracy than existing two-view RF (TVRF).

Key words: decision tree, random forest (RF), two-view learning, canonical correlation analysis (CCA)

中图分类号:

TP391

夏笑秋, 陈松灿. 改进的二视图随机森林[J]. 计算机科学与探索, 2022, 16(1): 144-152.

XIA Xiaoqiu, CHEN Songcan. Improved Two-View Random Forest[J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(1): 144-152.

图/表 9

表1 UCI数据集统计信息

Table 1 Statistics for UCI datasets

Dataset	Size	Feature	Class 1	Class 2
Iris	100	4	50	50
Banknote	1 372	4	762	610
Ionosphere	351	34	126	225
WBC	683	9	444	239
Seeds	140	7	70	70
Pima	768	8	268	500
Blood	748	4	570	178
Diabetes	768	4	500	258
WDBC	569	30	212	357
Waveform	3 308	40	1 653	1 655
CMC	962	9	629	333
Mushroom	8 124	22	3 916	4 208

表2 机器人执行故障数据集

Table 2 Robot execution failures dataset

Learning problem	Instances	Classes and distribution
LP1	88	24% normal 19% collision 18% front collision 39% obstruction
LP2	47	43% normal 13% front collision 15% back collision 11% collision to the right 19% collision to the left
LP3	47	43% ok 19% slightly 32% moved 6% lost
LP4	117	21% normal 62% collision 18% obstruction
LP5	164	27% normal 16% bottom collision 13% bottom obstruction 29% collision in part 16% collision in tool

表3 从MSRC-v1数据集中选择的二视图数据集信息

Table 3 Two-view dataset information selected from MSRC-v1 datasets

Dataset	Instances	View	Feature
MSRC1-v1	210	2	24 color moment,576 histogram of oriented gradient
MSRC2-v1	210	2	256 local binary pattern,254 centrist features

表4 实验仿真参数

Table 4 Experimental simulation parameters

$K$	$depth$	$min_obj$	criterion
10	max	2	information gain

表4 实验仿真参数

Table 4 Experimental simulation parameters

$K$	$depth$	$min_obj$	criterion
10	max	2	information gain

表5 AUC值和运行时间

Table 5 AUC value and running time

Dataset	TVRF		TV_fisherRF		ITVRF
Dataset	AUC	Time/ms	AUC	Time/ms	AUC	Time/ms
Iris	80.2%±0.08	1	84.5%±0.08	1	92.7%±0.05	1
Banknote	75.1%±0.02	400	75.2%±0.08	600	94.7%±0.01	600
Ionosphere	84.8%±0.03	500	82.2%±0.08	350	88.6%±0.03	450
WBC	86.2%±0.01	430	87.3%±0.01	590	90.2%±0.03	550
Seeds	92.3%±0.01	1	92.5%±0.01	1	93.0%±0.01	1
Pima	66.6%±0.05	500	67.8%±0.06	500	68.3%±0.05	450
Blood	63.7%±0.31	450	65.6%±0.38	800	66.2%±0.3	800
Diabetes	67.0%±0.04	200	68.3%±0.07	900	69.3%±0.06	800
WDBC	87.3%±0.02	500	90.9%±0.01	400	90.5%±0.02	400
Waveform	80.8%±0.03	3 800	81.0%±0.03	650	82.5%±0.03	550
CMC	49.8%±0.19	450	48.6%±0.10	900	50.3%±0.09	850
Mushroom	97.2%±0.01	900	97.9%±0.02	900	100.0%±0.00	800
SPECTF	51.6%±0.01	480	58.8%±0.01	600	71.4%±0.02	650
机器人执行故障	91.9%±0.01	500	92.1%±0.04	350	97.3%±0.01	350
MSRC1-v1	91.4%±0.02	7 200	92.2%±0.04	5 500	93.5%±0.01	4 500
MSRC2-v1	90.6%±0.04	7 800	91.1%±0.05	6 200	94.2%±0.02	6 600

表6 ITVRF与多视图算法MLRA的AUC值

Table 6 AUC values of ITVRF and multi-view method MLRA

Dataset	MLRA	ITVRF
Iris	84.2%±0.02	92.7%±0.05
Ionosphere	86.7%±0.03	88.6%±0.03
Pima	73.9%±0.03	68.3%±0.05
WDBC	93.1%±0.01	90.5%±0.02
Waveform	76.8%±0.02	82.5%±0.03

图1 不同K值下的AUC值

Fig.1 AUC value with different values of K

图2 不同depth值下的AUC值

Fig.2 AUC value with different values of depth

图3 不同min_obj值下的AUC值

Fig.3 AUC value with different values of min_obj

参考文献 27

[1]	BREIMAN L. Random forests[J]. Machine Learning, 2001, 45(1):5-32. DOI URL
[2]	BIAU G, SCORNET E. A random forest guided tour[J]. Test, 2016, 25(2):197-227. DOI URL
[3]	BOULESTEIX A L, JANITZA S, KRUPPA J, et al. Over-view of random forest methodology and practical guidance with emphasis on computational biology and bioinformatics[J]. Wiley Interdisciplinary Reviews: Data Mining and Know-ledge Discovery, 2012, 2(6):493-507.
[4]	KONTSCHIEDER P, FITERAU M, CRIMINISI A, et al. Deep neural decision forests[C]// Proceedings of the 2015 IEEE International Conference on Computer Vision, Santiago, Dec 13-16, 2015. Washington: IEEE Computer Society, 2015: 1467-1475.
[5]	COOTES T F, IONITA M C, LINDNER C, et al. Robust and accurate shape model fitting using random forest reg-ression voting[C]// LNCS 7578: Proceedings of the 12th European Conference on Computer Vision, Firenze, Oct 7-13, 2012. Berlin, Heidelberg: Springer, 2012: 278-291.
[6]	BIFET A, HOLMES G, PFAHRINGER B, et al. New ense-mble methods for evolving data streams[C]// Proceedings of the 15th ACM SIGKDD International Conference on Know-ledge Discovery and Data Mining, Paris, Jun 28-Jul 1, 2009. New York: ACM, 2009: 139-148.
[7]	XIONG C M, JOHNSON D M, XU R, et al. Random forests for metric learning with implicit pair-wise position dependence[C]// Proceedings of the 18th ACM SIGKDD Inter-national Conference on Knowledge Discovery and Data Mining, Beijing, Aug 12-16, 2012. New York: ACM, 2012: 958-966.
[8]	DENIL M, MATHESON D, DE FREITAS N. Narrowing the gap: random forests in theory and in practice[C]// Procee-dings of the 31st International Conference on Machine Learning, Beijing, Jun 21-26, 2014: 665-673.
[9]	WANG Y S, XIA S T, TANG Q T, et al. A novel consistent random forest framework: Bernoulli random forests[J]. IEEE Transactions on Neural Networks and Learning Sys-tems, 2018, 29(8):3510-3523.
[10]	BERNARD S, ADAM S, HEUTTE L. Dynamic random forests[J]. Pattern Recognition Letters, 2012, 33(12):1580-1586. DOI URL
[11]	ZHOU Z H, FENG J. Deep forest: towards an alternative to deep neural networks[C]// Proceedings of the 26th Inter-national Joint Conference on Artificial Intelligence, Mel-bourne, Aug 19-25, 2017. Menlo Park: AAAI, 2017: 3553-3559.
[12]	SUN S L. A survey of multi-view machine learning[J]. Neu-ral Computing and Applications, 2013, 23(7/8):2031-2038.
[13]	XU C, TAO D C, XU C. A survey on multi-view learning[J]. arXiv:1304.5634, 2013.
[14]	BLUM A, MITCHELL T M. Combining labeled and un-labeled data with cotraining[C]// Proceedings of the 11th Annual Conference on Computational Learning Theory, Madison, Jul 24-26, 1998. New York: ACM, 1998: 92-100.
[15]	MUSLEA I, MINTON S, KNOBLOCK C A. Active lear-ning with multiple views[J]. Journal of Artificial Intelli-gence Research, 2006, 27(1):203-233.
[16]	SUN S L, SHAWE-TAYLOR J. Sparse semi-supervised learning using conjugate functions[J]. Journal of Machine Learning Research, 2010, 11:2423-2455.
[17]	SUN S L, CHAO G Q. Multi-view maximum entropy discrimination[C]// Proceedings of the 23rd International Joint Conference on Artificial Intelligence, Beijing, Aug 3-9, 2013. Menlo Park: AAAI, 2013: 1706-1712.
[18]	GONZÁLEZ A, VILLALONGA G, XU J L, et al. Multi-view random forest of local experts combining RGB and LIDAR data for pedestrian detection[C]// Proceedings of the 2018 IEEE Intelligent Vehicles Symposium, Seoul, Jun 28-Jul 1, 2015. Piscataway: IEEE, 2015: 356-361.
[19]	CAO H L, BERNARD S, SABOURIN R, et al. Random forest dissimilarity based multiview learning for radiomics application[J]. Pattern Recognition, 2019, 88:185-197. DOI URL
[20]	HARDOON D R, SZEDMÁK S, SHAWE-TAYLOR J. Ca-nonical correlation analysis: an overview with application to learning methods[J]. Neural Computation, 2004, 16(12):2639-2664. DOI URL
[21]	SUN Q S, ZENG S G, LIU Y, et al. A new method of fea-ture fusion and its application in image recognition[J]. Pat-tern Recognition, 2005, 38(12):2437-2448.
[22]	FISHER R A. The use of multiple measurements in tax-onomic problems[J]. Annals of Eugenics, 1936, 7(2):179-188. DOI URL
[23]	周旭东, 陈晓红, 陈松灿. 增强组合特征判别性的典型相关分析[J]. 模式识别与人工智能, 2012, 25(2):285-291.
	ZHOU X D, CHEN X H, CHEN S C. Combined-feature-discriminability enhanced canonical correlation analysis[J]. Pattern Recognition and Artificial Intelligence, 2012, 25(2):285-291.
[24]	SUN T K, CHEN S C, YANG J Y, et al. A novel method of combined feature extraction for recognition[C]// Proceedings of the 8th IEEE International Conference on Data Mining, Pisa, Dec 15-19, 2008. Washington: IEEE Computer Society, 2008: 1043-1048.
[25]	LI S, SHAO M, FU Y. Multi-view low-rank analysis for outlier detection[C]// Proceedings of the 2015 SIAM Interna-tional Conference on Data Mining, Vancouver, Apr 30-May 2, 2015. Philadelphia: SIAM, 2015: 748-756.
[26]	ZHAO H D, FU Y. Dual-regularized multi-view outlier de-tection[C]// Proceedings of the 24th International Joint Conference on Artificial Intelligence, Buenos Aires, Jul 25-31, 2015. Menlo Park: AAAI, 2015: 4077-4083.
[27]	LÓPEZ-CHAU A, CERVANTES J, LÓPEZ-GARCÍA L, et al. Fisher’s decision tree[J]. Expert Systems with Applica-tions, 2013, 40(16):6283-6291.

改进的二视图随机森林

Improved Two-View Random Forest

RichHTML

PDF

可视化

摘要/Abstract

引用本文

使用本文

图/表 9

参考文献 27

相关文章 15

编辑推荐

Metrics

[1]	潘玉, 陈晓红, 李舜酩, 李纪永. 块增量典型相关分析[J]. 计算机科学与探索, 2022, 16(8): 1809-1818.
[2]	李智杰, 伊志林, 李昌华, 张颉. 应用于非精确图匹配的改进DF模型[J]. 计算机科学与探索, 2022, 16(6): 1383-1389.
[3]	毛伊敏, 耿俊豪. 结合信息论和范数的并行随机森林算法[J]. 计算机科学与探索, 2022, 16(5): 1064-1075.
[4]	鱼先锋, 耿生玲. 模糊智能决策树模型与应用研究[J]. 计算机科学与探索, 2022, 16(3): 703-712.
[5]	武晓栋, 刘敬浩, 金杰, 毛思平. 基于DT及PCA的DNN入侵检测模型[J]. 计算机科学与探索, 2021, 15(8): 1450-1458.
[6]	杜师帅，邱天，李灵巧，胡锦泉，郑安兵，冯艳春，胡昌勤，杨辉华. 多层梯度提升树在药品鉴别中的应用[J]. 计算机科学与探索, 2020, 14(2): 260-273.
[7]	尹儒，门昌骞，王文剑. 一种模型决策森林算法[J]. 计算机科学与探索, 2020, 14(1): 108-116.
[8]	陆慧娟，刘亚卿，孟亚琼，关伟，刘砚秋. 面向基因数据分类的核主成分分析旋转森林算法[J]. 计算机科学与探索, 2017, 11(10): 1570-1578.
[9]	钟雨，邱明明，黄向东. 大数据系统开发中的构件自动选型与参数配置[J]. 计算机科学与探索, 2016, 10(9): 1211-1220.
[10]	朱命冬，申德荣，解宁，于戈，寇月，聂铁铮. 面向关联关系数据的分布式相似性查询方法[J]. 计算机科学与探索, 2014, 8(7): 778-789.
[11]	李勇，黄志球，房丙午，王勇. 代价敏感分类的软件缺陷预测方法[J]. 计算机科学与探索, 2014, 8(12): 1442-1451.
[12]	王鑫，王熙照，陈建凯，翟俊海. 有序决策树的比较研究[J]. 计算机科学与探索, 2013, 7(11): 1018-1025.
[13]	丁鑫，陈晓红，陈松灿. 核诱导距离度量的鲁棒典型相关分析[J]. 计算机科学与探索, 2012, 6(8): 708-716.
[14]	陈红梅，王丽珍，刘惟一，袁立坚. 基于可达概率区间的不确定决策树[J]. 计算机科学与探索, 2012, 6(8): 726-740.
[15]	袁鼎荣,张师超+,朱晓峰,张晨 . 基于相对等待时间的代价敏感决策树[J]. 计算机科学与探索, 2007, 1(3): 314-324.