计算机科学与探索 ›› 2023, Vol. 17 ›› Issue (4): 964-972.DOI: 10.3778/j.issn.1673-9418.2107035

• 网络·安全 • 上一篇    下一篇

PLDP:收集和分析多维数据的个性化LDP

谷香,李艳辉,袁野,李新玲,王国仁   

  1. 1. 东北大学 计算机科学与工程学院,沈阳 110167
    2. 重庆交通大学 信息科学与工程学院,重庆 400074
    3. 北京理工大学 计算机学院,北京 100081
  • 出版日期:2023-04-01 发布日期:2023-04-01

PLDP: Personalized LDP for Collecting and Analyzing Multidimensional Data

GU Xiang, LI Yanhui, YUAN Ye, LI Xinling, WANG Guoren   

  1. 1. School of Computer Science and Engineering, Northeastern University, Shenyang 110167, China
    2. School of Information Science and Engineering, Chongqing Jiaotong University, Chongqing 400074, China
    3. School of Computer Science & Technology, Beijing Institute of Technology, Beijing 100081, China
  • Online:2023-04-01 Published:2023-04-01

摘要: 众包应用的普及加速了企业的发展,随之而来的隐私泄露问题已经成为公众关注的焦点。现有的本地化差分隐私(LDP)机制主要关注单个隐私级别的效用优化,这会导致某些用户因提供的隐私保护级别不足拒绝共享数据,而某些用户则获得过多的隐私保护。为满足用户不同的隐私保护需求,针对收集和分析多维混合型数据提出一种个性化本地差分隐私(PLDP)机制,为用户提供多个隐私保护级别。具体来说,提出一个个性化用户数据扰动框架,该框架针对数值型数据和分类型数据分别执行个性化的均值估计算法和频率估计算法,并通过理论分析证明算法的保密性和有效性。另外,提出一个个性化的采样方案,该方案根据服务器端的偏好对属性标签进行预处理,并按照其收集偏好对数据维度进行有偏采样。在两个真实数据集上的实验表明,与传统的LDP机制相比,提出的机制在保证用户数据隐私的同时,降低了收集数值型数据和分类型数据的统计误差,因此在隐私保护和数据可用性之间提供了更好的平衡。

关键词: 本地化差分隐私(LDP), 个性化本地差分隐私(PLDP), 数值型数据, 分类型数据, 众包

Abstract: The popularity of crowdsourcing applications accelerates the development of enterprises, and the privacy leakage has become the focus of public attention. The existing local differential privacy (LDP) mechanism mainly focuses on the utility optimization of a single privacy level, which will cause some users to refuse to share data due to insufficient privacy protection level, while some users get too much privacy protection. In order to meet different privacy protection needs of users, this paper proposes a personalized local differential privacy (PLDP) mechanism for collecting and analyzing multi-dimensional mixed data, which provides multiple privacy protection levels for users. Specifically, this paper proposes a personalized user data perturbation framework, which implements perso-nalized mean estimation algorithm and frequency estimation algorithm for numerical data and classified data respectively, and proves the confidentiality and effectiveness of the algorithm through theoretical analysis. In addition, a personalized sampling scheme is proposed, which preprocesses the attribute tags according to preferences of the server, and biases the data dimensions according to their collection preferences. Experiments on two real datasets show that, compared with traditional LDP mechanism, the proposed mechanism not only guarantees the privacy of user data, but also reduces the statistical error of collecting numerical data and classified data, so it pro-vides a better balance between privacy protection and data availability.

Key words: local differential privacy (LDP), personalized local differential privacy (PLDP), numerical data;classified data, crowdsourcing