Journal of Frontiers of Computer Science and Technology ›› 2017, Vol. 11 ›› Issue (12): 1941-1952.DOI: 10.3778/j.issn.1673-9418.1609012

Previous Articles     Next Articles

Research and Implementation of Framework for Large-Scale Multi-Dimensional Network Analysis

WANG Ze’ao+, WU Bin, WU Xinyu, ZHANG Zixing   

  1. College of Computer Science, Beijing University of Posts and Telecommunications, Beijing 100080, China
  • Online:2017-12-01 Published:2017-12-07

大规模多维网络数据分析框架的研究与实现

王泽奥+,吴  斌,吴心宇张子兴   

  1. 北京邮电大学 计算机学院,北京 100080

Abstract: With the rapid development of the Internet and the increasing of computer applications, a large number of graph data especially social networks are generated. Multi-dimensional information networks have become a common way to represent these data. However in the multi-dimensional information networks there are multiple types of nodes and attributes. How to process the analysis of multi-view and multi-granularity and mine the hidden information has become the focus of current research. Graph online analytical processing (GraphOLAP) can process a quick online analysis and query operation of graph data. With the existing achievement of GraphOLAP, this paper proposes a new Graph-Cube framework according to the characteristics of multi-dimensional information network. This paper introduces the concept of meta-path and uses main node to guide the aggregation of the meta-path. Then this paper uses meta-path to guide the roll-up/drill-down operation of the network and proposes attributes transform and homogeneous transform operation of the Graph-Cube. Finally, this paper discusses the materialization strategy and implements the framework in Spark. The experimental results on real and simulation datasets prove the efficiency and effectiveness of the proposed framework.

Key words:  GraphOLAP, Graph-Cube, meta-path, Spark

摘要: 随着互联网的快速发展和计算机应用的不断增加,大量的图数据特别是社会网络数据不断生成。多维信息网络已经成为表示这些图数据的通用方式。但是在多维信息网络中,节点的类型多种多样,节点的属性也不尽相同,如何对多维信息网络数据进行多角度多粒度的分析,挖掘其中的隐藏信息,成为人们关注的焦点。图联机分析处理技术(graph online analytical processing,GraphOLAP)可以对图数据进行快速的联机分析以及查询操作。借助于GraphOLAP的现有成果,针对多维信息网络的特点,提出了新的数据立方体框架。引入主节点的概念来指导元路径的生成,通过元路径指导网络的上卷下钻,提出属性转化和同质转化来丰富OLAP操作。最后讨论了优化的物化策略,使用并行计算框架Spark来实现算法,通过多个数据集验证了框架的有效性和高效性。

关键词: GraphOLAP, 数据立方体, 元路径, Spark