MapReduce框架下并行知识约简算法模型研究

doi:10.3778/j.issn.1673-9418.1206048

计算机科学与探索 ›› 2013, Vol. 7 ›› Issue (1): 35-45.DOI: 10.3778/j.issn.1673-9418.1206048

MapReduce框架下并行知识约简算法模型研究

钱进1,2,3，苗夺谦1,3+，张泽华1,3，张志飞1,3

1. 同济大学计算机科学与技术系，上海 201804
2. 江苏理工学院计算机工程学院，江苏常州 213001
3. 同济大学嵌入式系统与服务计算教育部重点实验室，上海 201804

出版日期:2013-01-01 发布日期:2012-12-29

Parallel Algorithm Model for Knowledge Reduction Using MapReduce

QIAN Jin1,2,3, MIAO Duoqian1,3+, ZHANG Zehua1,3, ZHANG Zhifei1,3

1. Department of Computer Science and Technology, Tongji University, Shanghai 201804, China
2. School of Computer Engineering, Jiangsu University of Technology, Changzhou, Jiangsu 213001, China
3. Key Laboratory of Embedded System & Service Computing, Ministry of Education of China, Tongji University, Shanghai 201804, China

Online:2013-01-01 Published:2012-12-29

摘要/Abstract

摘要：

面向大规模数据进行知识约简是近年来粗糙集理论研究热点。经典的知识约简算法是一次性将小数据集装入单机主存中进行约简，无法处理海量数据。深入剖析了知识约简算法中的可并行性；设计并实现了数据和任务同时并行的Map和Reduce函数，用于计算不同候选属性集导出的等价类和属性重要性；构建了一种MapReduce框架下并行知识约简算法模型，用于计算基于正区域、基于差别矩阵或基于信息熵的知识约简算法的一个约简。在Hadoop平台上进行了相关实验，实验结果表明，该并行知识约简算法模型可以高效地处理海量数据集。

关键词: MapReduce, 粗糙集, 知识约简, 数据并行, 任务并行

Abstract:

Knowledge reduction for massive datasets has attracted many research interests in rough set theory. Classical knowledge reduction algorithms assume that all datasets can be loaded into the main memory of a single machine, which are infeasible for large-scale data. Firstly, this paper analyzes the parallel computations among classical knowledge reduction algorithms. Then, in order to compute the equivalence classes and attribute significance on different candidate attribute sets, it designs and implements the Map and Reduce functions using data and task parallelism. Finally, it constructs the parallel algorithm framework model for knowledge reduction using MapReduce, which can be used to compute a reduct for the algorithms based on positive region, discernibility matrix or information entropy. The experimental results demonstrate that the proposed parallel knowledge reduction algorithms can efficiently process massive datasets on Hadoop platform.

Key words: MapReduce, rough set, knowledge reduction, data parallel, task parallel

钱进，苗夺谦，张泽华，张志飞. MapReduce框架下并行知识约简算法模型研究[J]. 计算机科学与探索, 2013, 7(1): 35-45.

QIAN Jin, MIAO Duoqian, ZHANG Zehua, ZHANG Zhifei. Parallel Algorithm Model for Knowledge Reduction Using MapReduce[J]. Journal of Frontiers of Computer Science and Technology, 2013, 7(1): 35-45.

101

HTML			PDF

最新录用	在线预览	正式出版	最新录用	在线预览	正式出版
0	0	0	0	0	101

	来源	本网站

	次数	101
	比例	100%

摘要

240

最新录用	在线预览	正式出版

0	0	240

	来源	本网站

	次数	240
	比例	100%

[1]	吴将, 宋晶晶, 程富豪, 王平心, 杨习贝. 面向连续参数的多粒度属性约简方法研究[J]. 计算机科学与探索, 2021, 15(8): 1555-1562.
[2]	孙妍，米据生，冯涛，李磊军，梁美社. 变精度极大相容块粗糙集模型及其属性约简[J]. 计算机科学与探索, 2020, 14(5): 892-900.
[3]	倪鹏，刘阳明，赵素云，陈红，李翠平. 动态模糊粗糙特征选取算法[J]. 计算机科学与探索, 2020, 14(2): 236-243.
[4]	饶亚，贾修一，李同军，商琳. 基于类间区分度的属性约简方法[J]. 计算机科学与探索, 2019, 13(8): 1422-1430.
[5]	张炜，王加阳，帅勇，龙陈锋. 直觉模糊决策系统的知识约简[J]. 计算机科学与探索, 2019, 13(7): 1145-1153.
[6]	薛占熬，吕敏杰，韩丹杰，张敏. 基于优势关系的程度粗糙直觉模糊集模型研究[J]. 计算机科学与探索, 2019, 13(6): 1070-1080.
[7]	董杰，王逊，张文冬，王平心，杨习贝. 面向局部多约束的属性约简方法研究[J]. 计算机科学与探索, 2019, 13(5): 875-883.
[8]	杜文胜. 直觉模糊序决策系统的部分一致约简[J]. 计算机科学与探索, 2019, 13(3): 514-520.
[9]	杨贵军，于洋. 粗糙集的Mallow's Cp选择算法[J]. 计算机科学与探索, 2019, 13(3): 521-528.
[10]	钱文彬，黄琴，王映龙，杨珺. 多标记不完备数据的特征选择算法[J]. 计算机科学与探索, 2019, 13(10): 1768-1780.
[11]	孙文鑫，刘玉锋，卓春英. 程度多粒度软粗糙集模型[J]. 计算机科学与探索, 2019, 13(10): 1793-1800.
[12]	陈家俊，徐华丽，魏赟. 多重代价多粒度决策粗糙集模型研究[J]. 计算机科学与探索, 2018, 12(5): 839-850.
[13]	李敬，王利东. 面向不完备信息系统的双论域决策粗糙集——基于双相对量化信息的角度[J]. 计算机科学与探索, 2018, 12(4): 653-661.
[14]	陈覃霞，刘盾，梁德翠. 粗糙集理论和信息熵的AHP改进方法[J]. 计算机科学与探索, 2018, 12(3): 484-493.
[15]	张娜，王慧琴，胡燕. 粗糙集与区域生长的烟雾图像分割算法研究[J]. 计算机科学与探索, 2017, 11(8): 1296-1304.

MapReduce框架下并行知识约简算法模型研究

Parallel Algorithm Model for Knowledge Reduction Using MapReduce

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐 0

Metrics