计算机科学与探索 ›› 2017, Vol. 11 ›› Issue (10): 1579-1590.DOI: 10.3778/j.issn.1673-9418.1608040

• 数据库技术 • 上一篇    下一篇

KD-TSS:精确隐私空间分割方法

金凯忠1,张啸剑1+,彭慧丽1,2   

  1. 1. 河南财经政法大学 计算机与信息工程学院,郑州 450046
    2. 河南广播电视大学,郑州 450008
  • 出版日期:2017-10-01 发布日期:2017-10-20

KD-TSS: Accurate Method for Private Spatial Decomposition

JIN Kaizhong1, ZHANG Xiaojian1+, PENG Huili1,2   

  1. 1. College of Computer and Information Engineering, Henan University of Economics and Law, Zhengzhou 450046, China
    2. Henan Radio & Television University, Zhengzhou 450008, China
  • Online:2017-10-01 Published:2017-10-20

摘要: 基于KD-树与差分隐私保护的空间数据分割得到了研究者的广泛关注,空间数据的大小与拉普拉斯噪音的多少直接制约着空间分割的精度。针对现有基于KD-树分割方法难以有效兼顾大规模空间数据与噪音量不足的问题,提出了一种满足差分隐私的KD-树分割方法SKD-Tree(sampling-based KD-Tree)。该方法利用满足差分隐私的伯努利随机抽样技术,抽取空间样本作为分割对象,然而却没有摆脱利用树高度控制拉普拉斯噪音。启发式设定合适的树高度非常困难,树高度过大,导致结点的噪音值过大;树高度过小,导致空间分割粒度太粗劣。为了弥补SKD-Tree方法的不足,提出了一种基于稀疏向量技术(sparse vector technology,SVT)的空间分割方法KD-TSS(KD-Tree with sampling and SVT)。该方法通过SVT判断树中结点是否继续分割,不再依赖KD-树高度来控制结点中的噪音值。SKD-Tree、KD-TSS与KD-Stand、KD-Hybrid在真实的大规模空间数据集上实验结果表明,其分割精度以及响应范围查询效果优于同类算法。

关键词: 差分隐私, KD-树, 隐私空间划分, 伯努利随机抽样, 稀疏向量技术

Abstract: KD-Tree-based differentially private spatial decomposition has attracted considerable research attention in recent years. The trade-off between the size of spatial data and Laplace noise directly constrains the accuracy of decomposition. This paper proposes a straightforward method with differential privacy, called SKD-TS (sampling-based KD-Tree) to partition spatial data. To handle the large-scale spatial data, this method employs Bernoulli random sampling technology to obtain the samples. While SKD-Tree still relies on the height of KD-Tree to control the Laplace noise. However, the choice of the height is a serious subtitle: a large height makes excessive noise in the nodes, while a small height leads to the partition too coarse-grained. To remedy the deficiency of SKD-Tree, this paper proposes another method, called KD-TSS (KD-Tree with sampling and SVT) for spatial decomposition. The sparse vector technology (SVT) is used in KD-TSS to judge whether a node of KD-Tree should be split, without depending on the height. SKD-TS and KD-TSS methods are compared with existing methods such as KD-Stand, KD-Hybird on the large-scale real datasets. The experimental results show that the two algorithms outperform their competitors, achieve the accurate decomposition and results of range query.

Key words: differential privacy, KD-Tree, private spatial decomposition, Bernoulli random sampling, sparse vector technology