计算机科学与探索 ›› 2007, Vol. 1 ›› Issue (2): 146-159.

• 学术研究 • 上一篇    下一篇

一种实时监控最近邻的近似算法

金澈清1+,崇志宏2,周傲英3   

  1. 1.华东理工大学 计算机科学与工程系,上海 200237
    2.东南大学 计算机科学与工程学院,南京 211189
    3.复旦大学 计算机科学与工程系,上海 200433
  • 收稿日期:1900-01-01 修回日期:1900-01-01 出版日期:2007-08-20 发布日期:2007-08-20
  • 通讯作者: 金澈清

An approximate approach to monitoring nearest neighbors in real time

JIN Cheqing1+,CHONG Zhihong2,ZHOU Aoying3   

  1. 1.Department of Computer Science, East China University of Science and Technology, Shanghai 200237, China 2.School of Computer Science and Engineering, Southeast University, Nanjing 211189, China 3.Department of Computer Science and Engineering, Fudan University, Shanghai 200433, China

  • Received:1900-01-01 Revised:1900-01-01 Online:2007-08-20 Published:2007-08-20
  • Contact: JIN Cheqing

摘要: 处理分布式环境下高速数据的最大挑战在于如何利用少量网络资源输出高质量的查询结果。对面向分布式环境的最近邻查询问题进行了研究,提出了一种基于过滤器的新方法,不仅能计算精确查询结果,还能够处理五类近似查询。该方法在各个远程站点均安装了智能过滤器,并通过合理设置过滤器的范围来降低数据传输量。理论分析及基于模拟数据集合和真实数据集合的实验报告均表明新方法具有较高的性能。

关键词: 最近邻查询, 分布式环境, 近似算法, 数据流

Abstract: The biggest challenge to processing high-speed data over distributed environment is to output qualified results by using small amount of network resource. The paper studies how to cope with nearest neighbors query over distributed environment and proposes a novel solution, which is capable of answering not only precise query, but also five kinds of approximate queries. After installing a Smart Filter in each remote site to filter parts of incoming data, the novel approach continuously adjusts the range monitored by each filter to reduce the overall communication cost. Theoretic analysis and experimental results based on synthetic datasets and real dataset indicate that new approach owns good performance.

Key words: nearest neighbors query, distributed environment, approximate algorithm, data stream