计算机科学与探索 ›› 2022, Vol. 16 ›› Issue (1): 144-152.DOI: 10.3778/j.issn.1673-9418.2008038
收稿日期:
2020-08-12
修回日期:
2020-12-02
出版日期:
2022-01-01
发布日期:
2020-12-08
通讯作者:
+ E-mail: s.chen@nuaa.edu.cn作者简介:
夏笑秋(1996—),女,硕士研究生,主要研究方向为模式识别、机器学习。基金资助:
XIA Xiaoqiu1, CHEN Songcan1,+()
Received:
2020-08-12
Revised:
2020-12-02
Online:
2022-01-01
Published:
2020-12-08
About author:
XIA Xiaoqiu, born in 1996, M.S. candidate. Her research interests include pattern recognition and machine learning.Supported by:
摘要:
随机森林(RF)是最经典的机器学习算法之一,并已获得广泛应用。然而观察发现,尽管现实中存在众多的二视图数据并已获得广泛的分析研究,但针对二视图场景的RF构建相当少,仅有的利用RF解决二视图学习问题的方法也都是先为各个视图生成各自的RF,在决策时才融合了视图间的信息。这样的方法存在一个显著不足是在其RF的构建阶段未利用两个视图间的相关性,这无疑浪费了信息资源。为了弥补这一不足,提出了一种改进的二视图随机森林(ITVRF)。具体而言,在决策树的生成过程中采用典型相关分析(CCA)进行视图融合,将视图间的信息交互融入到了决策树的构建阶段,实现了视图间互补信息在整个RF生成过程中的利用。此外,ITVRF还通过判别分析为决策树生成判别决策边界,更适合于分类。实验结果表明ITVRF比现有的二视图RF(TVRF)有着更优的准确率。
中图分类号:
夏笑秋, 陈松灿. 改进的二视图随机森林[J]. 计算机科学与探索, 2022, 16(1): 144-152.
XIA Xiaoqiu, CHEN Songcan. Improved Two-View Random Forest[J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(1): 144-152.
Dataset | Size | Feature | Class 1 | Class 2 |
---|---|---|---|---|
Iris | 100 | 4 | 50 | 50 |
Banknote | 1 372 | 4 | 762 | 610 |
Ionosphere | 351 | 34 | 126 | 225 |
WBC | 683 | 9 | 444 | 239 |
Seeds | 140 | 7 | 70 | 70 |
Pima | 768 | 8 | 268 | 500 |
Blood | 748 | 4 | 570 | 178 |
Diabetes | 768 | 4 | 500 | 258 |
WDBC | 569 | 30 | 212 | 357 |
Waveform | 3 308 | 40 | 1 653 | 1 655 |
CMC | 962 | 9 | 629 | 333 |
Mushroom | 8 124 | 22 | 3 916 | 4 208 |
表1 UCI数据集统计信息
Table 1 Statistics for UCI datasets
Dataset | Size | Feature | Class 1 | Class 2 |
---|---|---|---|---|
Iris | 100 | 4 | 50 | 50 |
Banknote | 1 372 | 4 | 762 | 610 |
Ionosphere | 351 | 34 | 126 | 225 |
WBC | 683 | 9 | 444 | 239 |
Seeds | 140 | 7 | 70 | 70 |
Pima | 768 | 8 | 268 | 500 |
Blood | 748 | 4 | 570 | 178 |
Diabetes | 768 | 4 | 500 | 258 |
WDBC | 569 | 30 | 212 | 357 |
Waveform | 3 308 | 40 | 1 653 | 1 655 |
CMC | 962 | 9 | 629 | 333 |
Mushroom | 8 124 | 22 | 3 916 | 4 208 |
Learning problem | Instances | Classes and distribution |
---|---|---|
LP1 | 88 | 24% normal 19% collision 18% front collision 39% obstruction |
LP2 | 47 | 43% normal 13% front collision 15% back collision 11% collision to the right 19% collision to the left |
LP3 | 47 | 43% ok 19% slightly 32% moved 6% lost |
LP4 | 117 | 21% normal 62% collision 18% obstruction |
LP5 | 164 | 27% normal 16% bottom collision 13% bottom obstruction 29% collision in part 16% collision in tool |
表2 机器人执行故障数据集
Table 2 Robot execution failures dataset
Learning problem | Instances | Classes and distribution |
---|---|---|
LP1 | 88 | 24% normal 19% collision 18% front collision 39% obstruction |
LP2 | 47 | 43% normal 13% front collision 15% back collision 11% collision to the right 19% collision to the left |
LP3 | 47 | 43% ok 19% slightly 32% moved 6% lost |
LP4 | 117 | 21% normal 62% collision 18% obstruction |
LP5 | 164 | 27% normal 16% bottom collision 13% bottom obstruction 29% collision in part 16% collision in tool |
Dataset | Instances | View | Feature |
---|---|---|---|
MSRC1-v1 | 210 | 2 | 24 color moment,576 histogram of oriented gradient |
MSRC2-v1 | 210 | 2 | 256 local binary pattern,254 centrist features |
表3 从MSRC-v1数据集中选择的二视图数据集信息
Table 3 Two-view dataset information selected from MSRC-v1 datasets
Dataset | Instances | View | Feature |
---|---|---|---|
MSRC1-v1 | 210 | 2 | 24 color moment,576 histogram of oriented gradient |
MSRC2-v1 | 210 | 2 | 256 local binary pattern,254 centrist features |
| | | criterion |
---|---|---|---|
10 | max | 2 | information gain |
表4 实验仿真参数
Table 4 Experimental simulation parameters
| | | criterion |
---|---|---|---|
10 | max | 2 | information gain |
Dataset | TVRF | TV_fisherRF | ITVRF | |||
---|---|---|---|---|---|---|
AUC | Time/ms | AUC | Time/ms | AUC | Time/ms | |
Iris | 80.2%±0.08 | 1 | 84.5%±0.08 | 1 | 92.7%±0.05 | 1 |
Banknote | 75.1%±0.02 | 400 | 75.2%±0.08 | 600 | 94.7%±0.01 | 600 |
Ionosphere | 84.8%±0.03 | 500 | 82.2%±0.08 | 350 | 88.6%±0.03 | 450 |
WBC | 86.2%±0.01 | 430 | 87.3%±0.01 | 590 | 90.2%±0.03 | 550 |
Seeds | 92.3%±0.01 | 1 | 92.5%±0.01 | 1 | 93.0%±0.01 | 1 |
Pima | 66.6%±0.05 | 500 | 67.8%±0.06 | 500 | 68.3%±0.05 | 450 |
Blood | 63.7%±0.31 | 450 | 65.6%±0.38 | 800 | 66.2%±0.3 | 800 |
Diabetes | 67.0%±0.04 | 200 | 68.3%±0.07 | 900 | 69.3%±0.06 | 800 |
WDBC | 87.3%±0.02 | 500 | 90.9%±0.01 | 400 | 90.5%±0.02 | 400 |
Waveform | 80.8%±0.03 | 3 800 | 81.0%±0.03 | 650 | 82.5%±0.03 | 550 |
CMC | 49.8%±0.19 | 450 | 48.6%±0.10 | 900 | 50.3%±0.09 | 850 |
Mushroom | 97.2%±0.01 | 900 | 97.9%±0.02 | 900 | 100.0%±0.00 | 800 |
SPECTF | 51.6%±0.01 | 480 | 58.8%±0.01 | 600 | 71.4%±0.02 | 650 |
机器人执行故障 | 91.9%±0.01 | 500 | 92.1%±0.04 | 350 | 97.3%±0.01 | 350 |
MSRC1-v1 | 91.4%±0.02 | 7 200 | 92.2%±0.04 | 5 500 | 93.5%±0.01 | 4 500 |
MSRC2-v1 | 90.6%±0.04 | 7 800 | 91.1%±0.05 | 6 200 | 94.2%±0.02 | 6 600 |
表5 AUC值和运行时间
Table 5 AUC value and running time
Dataset | TVRF | TV_fisherRF | ITVRF | |||
---|---|---|---|---|---|---|
AUC | Time/ms | AUC | Time/ms | AUC | Time/ms | |
Iris | 80.2%±0.08 | 1 | 84.5%±0.08 | 1 | 92.7%±0.05 | 1 |
Banknote | 75.1%±0.02 | 400 | 75.2%±0.08 | 600 | 94.7%±0.01 | 600 |
Ionosphere | 84.8%±0.03 | 500 | 82.2%±0.08 | 350 | 88.6%±0.03 | 450 |
WBC | 86.2%±0.01 | 430 | 87.3%±0.01 | 590 | 90.2%±0.03 | 550 |
Seeds | 92.3%±0.01 | 1 | 92.5%±0.01 | 1 | 93.0%±0.01 | 1 |
Pima | 66.6%±0.05 | 500 | 67.8%±0.06 | 500 | 68.3%±0.05 | 450 |
Blood | 63.7%±0.31 | 450 | 65.6%±0.38 | 800 | 66.2%±0.3 | 800 |
Diabetes | 67.0%±0.04 | 200 | 68.3%±0.07 | 900 | 69.3%±0.06 | 800 |
WDBC | 87.3%±0.02 | 500 | 90.9%±0.01 | 400 | 90.5%±0.02 | 400 |
Waveform | 80.8%±0.03 | 3 800 | 81.0%±0.03 | 650 | 82.5%±0.03 | 550 |
CMC | 49.8%±0.19 | 450 | 48.6%±0.10 | 900 | 50.3%±0.09 | 850 |
Mushroom | 97.2%±0.01 | 900 | 97.9%±0.02 | 900 | 100.0%±0.00 | 800 |
SPECTF | 51.6%±0.01 | 480 | 58.8%±0.01 | 600 | 71.4%±0.02 | 650 |
机器人执行故障 | 91.9%±0.01 | 500 | 92.1%±0.04 | 350 | 97.3%±0.01 | 350 |
MSRC1-v1 | 91.4%±0.02 | 7 200 | 92.2%±0.04 | 5 500 | 93.5%±0.01 | 4 500 |
MSRC2-v1 | 90.6%±0.04 | 7 800 | 91.1%±0.05 | 6 200 | 94.2%±0.02 | 6 600 |
Dataset | MLRA | ITVRF |
---|---|---|
Iris | 84.2%±0.02 | 92.7%±0.05 |
Ionosphere | 86.7%±0.03 | 88.6%±0.03 |
Pima | 73.9%±0.03 | 68.3%±0.05 |
WDBC | 93.1%±0.01 | 90.5%±0.02 |
Waveform | 76.8%±0.02 | 82.5%±0.03 |
表6 ITVRF与多视图算法MLRA的AUC值
Table 6 AUC values of ITVRF and multi-view method MLRA
Dataset | MLRA | ITVRF |
---|---|---|
Iris | 84.2%±0.02 | 92.7%±0.05 |
Ionosphere | 86.7%±0.03 | 88.6%±0.03 |
Pima | 73.9%±0.03 | 68.3%±0.05 |
WDBC | 93.1%±0.01 | 90.5%±0.02 |
Waveform | 76.8%±0.02 | 82.5%±0.03 |
[1] |
BREIMAN L. Random forests[J]. Machine Learning, 2001, 45(1):5-32.
DOI URL |
[2] |
BIAU G, SCORNET E. A random forest guided tour[J]. Test, 2016, 25(2):197-227.
DOI URL |
[3] | BOULESTEIX A L, JANITZA S, KRUPPA J, et al. Over-view of random forest methodology and practical guidance with emphasis on computational biology and bioinformatics[J]. Wiley Interdisciplinary Reviews: Data Mining and Know-ledge Discovery, 2012, 2(6):493-507. |
[4] | KONTSCHIEDER P, FITERAU M, CRIMINISI A, et al. Deep neural decision forests[C]// Proceedings of the 2015 IEEE International Conference on Computer Vision, Santiago, Dec 13-16, 2015. Washington: IEEE Computer Society, 2015: 1467-1475. |
[5] | COOTES T F, IONITA M C, LINDNER C, et al. Robust and accurate shape model fitting using random forest reg-ression voting[C]// LNCS 7578: Proceedings of the 12th European Conference on Computer Vision, Firenze, Oct 7-13, 2012. Berlin, Heidelberg: Springer, 2012: 278-291. |
[6] | BIFET A, HOLMES G, PFAHRINGER B, et al. New ense-mble methods for evolving data streams[C]// Proceedings of the 15th ACM SIGKDD International Conference on Know-ledge Discovery and Data Mining, Paris, Jun 28-Jul 1, 2009. New York: ACM, 2009: 139-148. |
[7] | XIONG C M, JOHNSON D M, XU R, et al. Random forests for metric learning with implicit pair-wise position dependence[C]// Proceedings of the 18th ACM SIGKDD Inter-national Conference on Knowledge Discovery and Data Mining, Beijing, Aug 12-16, 2012. New York: ACM, 2012: 958-966. |
[8] | DENIL M, MATHESON D, DE FREITAS N. Narrowing the gap: random forests in theory and in practice[C]// Procee-dings of the 31st International Conference on Machine Learning, Beijing, Jun 21-26, 2014: 665-673. |
[9] | WANG Y S, XIA S T, TANG Q T, et al. A novel consistent random forest framework: Bernoulli random forests[J]. IEEE Transactions on Neural Networks and Learning Sys-tems, 2018, 29(8):3510-3523. |
[10] |
BERNARD S, ADAM S, HEUTTE L. Dynamic random forests[J]. Pattern Recognition Letters, 2012, 33(12):1580-1586.
DOI URL |
[11] | ZHOU Z H, FENG J. Deep forest: towards an alternative to deep neural networks[C]// Proceedings of the 26th Inter-national Joint Conference on Artificial Intelligence, Mel-bourne, Aug 19-25, 2017. Menlo Park: AAAI, 2017: 3553-3559. |
[12] | SUN S L. A survey of multi-view machine learning[J]. Neu-ral Computing and Applications, 2013, 23(7/8):2031-2038. |
[13] | XU C, TAO D C, XU C. A survey on multi-view learning[J]. arXiv:1304.5634, 2013. |
[14] | BLUM A, MITCHELL T M. Combining labeled and un-labeled data with cotraining[C]// Proceedings of the 11th Annual Conference on Computational Learning Theory, Madison, Jul 24-26, 1998. New York: ACM, 1998: 92-100. |
[15] | MUSLEA I, MINTON S, KNOBLOCK C A. Active lear-ning with multiple views[J]. Journal of Artificial Intelli-gence Research, 2006, 27(1):203-233. |
[16] | SUN S L, SHAWE-TAYLOR J. Sparse semi-supervised learning using conjugate functions[J]. Journal of Machine Learning Research, 2010, 11:2423-2455. |
[17] | SUN S L, CHAO G Q. Multi-view maximum entropy discrimination[C]// Proceedings of the 23rd International Joint Conference on Artificial Intelligence, Beijing, Aug 3-9, 2013. Menlo Park: AAAI, 2013: 1706-1712. |
[18] | GONZÁLEZ A, VILLALONGA G, XU J L, et al. Multi-view random forest of local experts combining RGB and LIDAR data for pedestrian detection[C]// Proceedings of the 2018 IEEE Intelligent Vehicles Symposium, Seoul, Jun 28-Jul 1, 2015. Piscataway: IEEE, 2015: 356-361. |
[19] |
CAO H L, BERNARD S, SABOURIN R, et al. Random forest dissimilarity based multiview learning for radiomics application[J]. Pattern Recognition, 2019, 88:185-197.
DOI URL |
[20] |
HARDOON D R, SZEDMÁK S, SHAWE-TAYLOR J. Ca-nonical correlation analysis: an overview with application to learning methods[J]. Neural Computation, 2004, 16(12):2639-2664.
DOI URL |
[21] | SUN Q S, ZENG S G, LIU Y, et al. A new method of fea-ture fusion and its application in image recognition[J]. Pat-tern Recognition, 2005, 38(12):2437-2448. |
[22] |
FISHER R A. The use of multiple measurements in tax-onomic problems[J]. Annals of Eugenics, 1936, 7(2):179-188.
DOI URL |
[23] | 周旭东, 陈晓红, 陈松灿. 增强组合特征判别性的典型相关分析[J]. 模式识别与人工智能, 2012, 25(2):285-291. |
ZHOU X D, CHEN X H, CHEN S C. Combined-feature-discriminability enhanced canonical correlation analysis[J]. Pattern Recognition and Artificial Intelligence, 2012, 25(2):285-291. | |
[24] | SUN T K, CHEN S C, YANG J Y, et al. A novel method of combined feature extraction for recognition[C]// Proceedings of the 8th IEEE International Conference on Data Mining, Pisa, Dec 15-19, 2008. Washington: IEEE Computer Society, 2008: 1043-1048. |
[25] | LI S, SHAO M, FU Y. Multi-view low-rank analysis for outlier detection[C]// Proceedings of the 2015 SIAM Interna-tional Conference on Data Mining, Vancouver, Apr 30-May 2, 2015. Philadelphia: SIAM, 2015: 748-756. |
[26] | ZHAO H D, FU Y. Dual-regularized multi-view outlier de-tection[C]// Proceedings of the 24th International Joint Conference on Artificial Intelligence, Buenos Aires, Jul 25-31, 2015. Menlo Park: AAAI, 2015: 4077-4083. |
[27] | LÓPEZ-CHAU A, CERVANTES J, LÓPEZ-GARCÍA L, et al. Fisher’s decision tree[J]. Expert Systems with Applica-tions, 2013, 40(16):6283-6291. |
[1] | 潘玉, 陈晓红, 李舜酩, 李纪永. 块增量典型相关分析[J]. 计算机科学与探索, 2022, 16(8): 1809-1818. |
[2] | 李智杰, 伊志林, 李昌华, 张颉. 应用于非精确图匹配的改进DF模型[J]. 计算机科学与探索, 2022, 16(6): 1383-1389. |
[3] | 毛伊敏, 耿俊豪. 结合信息论和范数的并行随机森林算法[J]. 计算机科学与探索, 2022, 16(5): 1064-1075. |
[4] | 鱼先锋, 耿生玲. 模糊智能决策树模型与应用研究[J]. 计算机科学与探索, 2022, 16(3): 703-712. |
[5] | 武晓栋, 刘敬浩, 金杰, 毛思平. 基于DT及PCA的DNN入侵检测模型[J]. 计算机科学与探索, 2021, 15(8): 1450-1458. |
[6] | 杜师帅,邱天,李灵巧,胡锦泉,郑安兵,冯艳春,胡昌勤,杨辉华. 多层梯度提升树在药品鉴别中的应用[J]. 计算机科学与探索, 2020, 14(2): 260-273. |
[7] | 尹儒,门昌骞,王文剑. 一种模型决策森林算法[J]. 计算机科学与探索, 2020, 14(1): 108-116. |
[8] | 陆慧娟,刘亚卿,孟亚琼,关伟,刘砚秋. 面向基因数据分类的核主成分分析旋转森林算法[J]. 计算机科学与探索, 2017, 11(10): 1570-1578. |
[9] | 钟雨,邱明明,黄向东. 大数据系统开发中的构件自动选型与参数配置[J]. 计算机科学与探索, 2016, 10(9): 1211-1220. |
[10] | 朱命冬,申德荣,解宁,于戈,寇月,聂铁铮. 面向关联关系数据的分布式相似性查询方法[J]. 计算机科学与探索, 2014, 8(7): 778-789. |
[11] | 李勇,黄志球,房丙午,王勇. 代价敏感分类的软件缺陷预测方法[J]. 计算机科学与探索, 2014, 8(12): 1442-1451. |
[12] | 王鑫,王熙照,陈建凯,翟俊海. 有序决策树的比较研究[J]. 计算机科学与探索, 2013, 7(11): 1018-1025. |
[13] | 丁 鑫,陈晓红,陈松灿. 核诱导距离度量的鲁棒典型相关分析[J]. 计算机科学与探索, 2012, 6(8): 708-716. |
[14] | 陈红梅,王丽珍,刘惟一,袁立坚. 基于可达概率区间的不确定决策树[J]. 计算机科学与探索, 2012, 6(8): 726-740. |
[15] | 袁鼎荣,张师超+,朱晓峰,张晨 . 基于相对等待时间的代价敏感决策树[J]. 计算机科学与探索, 2007, 1(3): 314-324. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||