计算机科学与探索 ›› 2019, Vol. 13 ›› Issue (3): 394-407.DOI: 10.3778/j.issn.1673-9418.1807047

• 数据库技术 • 上一篇    下一篇

多方强隐私保护记录链接方法

佟丹妮+,申德荣,韩姝敏,聂铁铮,寇  月,于  戈   

  1. 东北大学 计算机科学与工程学院,沈阳 110819
  • 出版日期:2019-03-01 发布日期:2019-03-11

Multi-Party Strong-Privacy-Preserving Record Linkage Method

TONG Danni+, SHEN Derong, HAN Shumin, NIE Tiezheng, KOU Yue, YU Ge   

  1. School of Computer Science and Engineering, Northeastern University, Shenyang 110819, China
  • Online:2019-03-01 Published:2019-03-11

摘要: 链接跨组织数据库中表示同一实体的记录,同时保护存储在这些数据库中实体的隐私,是安全有效地整合多源数据资源的核心技术之一。然而,已有隐私保护记录链接(privacy-preserving record linkage,PPRL)技术中的分块方法不能同时保证高查全率和高查准率,强隐私性的匹配方法存在时间代价过大的不足,且对多于两个数据库间的匹配研究很少。针对上述问题,提出了一种多方强隐私保护记录链接方法(multi-party strong-privacy-preserving record linkage,MP-SPPRL)。首先,提出了一种局部敏感哈希(locality sensitive Hashing,LSH)结合后缀分块的二次分块方法,并引入分块分散度调节两次分块,在保证MP-SPPRL高查全率的前提下有效地提高了查准率;接着,利用滑动窗口合并分块生成候选记录组,保证MP-SPPRL的容错率;然后,采用基于同态加密的Hamming距离计算方法,设计了一种适用于大型数据的基于安全多方计算(secure multi-party computation,SMC)的可伸缩多方记录匹配算法,通过缩减加密记录数量和提前终止不可能匹配的候选记录组的距离计算,显著降低了匹配的时间代价,提高了MP-SPPRL的效率;最后,通过大量实验验证了MP-SPPRL的高查全率、高查准率和高效性。

关键词: 记录链接, 隐私保护, 二次分块, 记录匹配

Abstract: Linking records that represent the same entity among different databases while protecting the privacy of entities stored in those databases is one of the core technologies for the safe and efficient integration of multi-party data resources. However, the existing blocking methods for privacy-preserving record linkage (PPRL) cannot ensure the recall and the precision at the same time, the strong privacy matching methods cost a lot of time and there are few matching studies for more than two databases. To address these problems, this paper proposes a multi-party strong-privacy-preserving record linkage method (MP-SPPRL). Firstly, a double blocking method with locality sensitive Hashing (LSH) blocking and suffix blocking is proposed, which uses dispersion to adjust the double blocking process. The method effectively improves the recall and precision of MP-SPPRL. Next, a sliding window is used to merge the blocks to generate candidate record groups for ensuring the fault-tolerant rate of MP-SPPRL. Then, a scalable multi-party matching algorithm based on secure multi-party computation (SMC) is designed by using the homomorphic Hamming distance calculation for large data. The algorithm significantly reduces the record matching cost by cutting encrypted records and stopping the distance calculation of candidate record groups that cannot be matched. Finally, the empirical evaluation demonstrates the high recall, high precision and high efficiency of MP-SPPRL.

Key words: record linkage, privacy preserving, double blocking, record matching