计算机科学与探索 ›› 2013, Vol. 7 ›› Issue (1): 1-13.DOI: 10.3778/j.issn.1673-9418.1209024

• 综述·探索 • 上一篇    下一篇

相似性连接查询技术研究进展

庞  俊,谷  峪,许  嘉,于  戈   

  1. 东北大学 信息科学与工程学院,沈阳 110819
  • 出版日期:2013-01-01 发布日期:2012-12-29

Research Advance on Similarity Join Queries

PANG Jun, GU Yu, XU Jia, YU Ge   

  1. School of Information Science and Engineering, Northeastern University, Shenyang 110819, China
  • Online:2013-01-01 Published:2012-12-29

摘要: 相似性连接查询,即查找相似的数据对象对,具有广泛的应用领域,例如相似网页检测、实体解析、数据清洗和相似图像检索等。相似性连接查询是当前大数据处理领域的热点问题之一。讨论了相似性连接查询面临的挑战;根据不同的标准对现有的相似性连接查询进行了分类;总结并比较了现有的字符串、集合、向量和图相似性连接算法;探讨了今后的研究重点和发展趋势。

关键词: 相似性连接查询, 相似性度量, 海量数据

Abstract:  Similarity join query is to find similar data object pairs for a wide range of applications, such as near duplicate Web page detection, entity resolution, data cleaning and similar image retrieval. Nowadays, similarity join query becomes one of the hot topics in the field of big data processing. This paper discusses the challenges of similarity join query. Meanwhile, it analyzes and classifies the existing similarity join queries according to different standards, then summarizes and compares the existing string, set, vector and graph similarity join algorithms respectively. Finally, it explores the research focus and trend of this area.

Key words: similarity join query, similarity metrics, massive data