计算机科学与探索 ›› 2014, Vol. 8 ›› Issue (7): 778-789.DOI: 10.3778/j.issn.1673-9418.1403028

• 数据库技术 • 上一篇    下一篇

面向关联关系数据的分布式相似性查询方法

朱命冬+,申德荣,解  宁,于  戈,寇  月,聂铁铮   

  1. 东北大学 信息科学与工程学院,沈阳 110004
  • 出版日期:2014-07-01 发布日期:2014-07-02

Distributed Similarity Query Method on Data with Relation Information

ZHU Mingdong+, SHEN Derong, XIE Ning, YU Ge, KOU Yue, NIE Tiezheng   

  1. College of Information Science and Engineering, Northeastern University, Shenyang 110004, China
  • Online:2014-07-01 Published:2014-07-02

摘要: 带有关联关系的数据在社网平台、电子商务平台、科学数据库等环境中普遍存在,对其进行相似性查询是在各种应用中常见的操作。随着社网、电子商务、云计算等技术的发展和普及,具有关联关系的数据飞速增长,对这种类型的数据进行相似性查询成为数据库领域的一个研究热点。在此应用背景下,提出了一种基于决策树的面向关联关系型数据的分布式相似性查询方法。该方法依据属性的重要度计算相似性,计算过程中达到一定的准确度时可以结束计算,从而在保证准确性的情况下减少了计算量。同时提出了两种分布式环境下面向大数据量的决策树计算方法,该方法具有较少的通信代价,并且有概率理论保证其准确度。最后通过大量的实验证明了方法的有效性。

关键词: 相似性查询, 关联关系型数据, 决策树, 分布式查询方法

Abstract:  Data with relation information are ubiquitous in kinds of environments, such as social network, e-commerce and science database, etc. With the development and popularization of the technology of social network, e-commerce and cloud computing, data with relation information grow explosively, it becomes a hot research topic to process similarity query on the data in the database field. So this paper proposes a distributed similarity query method on data with relation information, which is based on decision tree. This method can compute the similarity according to the importance of attributes, and stop the computation when the precision is achieved, so as to reduce the computation cost. And this paper also proposes two algorithms of computing decision tree on large data, which cause less communication cost than existing methods and have accuracy guarantee. Lots of experiments verify the effectiveness and efficiency of the algorithms.

Key words: similarity query, relation information, decision tree, distributed query method