计算机科学与探索 ›› 2013, Vol. 7 ›› Issue (9): 811-818.DOI: 10.3778/j.issn.1673-9418.1303047

• 学术研究 • 上一篇    下一篇

高维分布式局部敏感哈希索引方法

林朝晖1,于俊清1,2+,何云峰1,管  涛1,艾列富1   

  1. 1. 华中科技大学 计算机科学与技术学院,武汉 430074
    2. 华中科技大学 网络与计算中心,武汉 430074
  • 出版日期:2013-09-01 发布日期:2013-09-04

High-Dimensional Distributed Indexing Based on Locality-Sensitive Hashing

LIN Chaohui1, YU Junqing1,2+, HE Yunfeng1, GUAN Tao1, AI Liefu1   

  1. 1. School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan 430074, China
    2. Center of Network and Computation, Huazhong University of Science and Technology, Wuhan 430074, China
  • Online:2013-09-01 Published:2013-09-04

摘要: 为了解决基于内容的图像检索中存在的索引存储量大和构建索引计算开销大等问题,在系统地分析局部敏感哈希索引算法及Hadoop分布式系统的基础上,改进了现有高维索引计算模型和索引结构方法。根据局部敏感哈希索引的特点,将现有局部敏感哈希索引改为松耦合的索引结构,将索引文件分布式部署在多个查询节点中实现了高并发的索引查询。通过MapReduce分布式计算模型实现了索引的并行构造,提高了索引构造的效率,并采用分布式数据库存储海量高维索引数据,增强了系统可扩展性。实验结果表明,该算法具有一定的可行性。

关键词: 局部敏感哈希, 分布式索引, 基于内容图像检索

Abstract: To overcome the problems of high memory consumption and computational overhead of high-dimensional indexing in content-based image search engine, the locality-sensitive hashing (LSH) index and Hadoop can be combined to improve the performance of the index architecture and computational model. According to the features of LSH index, the structure of LSH index is modified to a loosely coupled structure, and the index files are deployed in the distributed query nodes for high concurrency index-guery. The MapReduce distributed computational model is used in index constructing process to improve the efficiency of high-dimensional index creation. Besides, the distributed database is used to store large amounts of high-dimensional index data, which enhances the system’s scalability. The experimental results show that the proposed methods are reasonable.

Key words: locality-sensitive hashing, distributed indexing, content-based image retrieval