计算机科学与探索 ›› 2012, Vol. 6 ›› Issue (11): 961-973.DOI: 10.3778/j.issn.1673-9418.2012.11.001

• 学术研究 • 上一篇    下一篇

大数据模式分解的隐私保护研究

李  宁, 朱  青+   

  1. 中国人民大学 信息学院 计算机系,北京 100872
  • 出版日期:2012-11-01 发布日期:2012-11-02

Privacy Preserving Based on Model Division for Large Data

LI Ning, ZHU Qing+   

  1. Department of Computer Science, School of Information, Renmin University of China, Beijing 100872, China
  • Online:2012-11-01 Published:2012-11-02

摘要: 现有的大多数隐私保护技术往往忽略了敏感属性不同取值和准标识符属性之间存在的特殊关联,并且各领域对数据隐私保护的多方面要求,使得发布的匿名数据需要满足复合隐私约束。对近似敏感属性值和复合隐私约束进行分析,提出了基于大数据模式分解和聚类分析的隐私保护算法。给出了聚类敏感属性值保护相似值方法, 设置不同权重的敏感属性,保留重要的属性。使用三维不规则结构矩阵的效用矩阵,来获取精度较高的匿名数据,实现匿名数据的模式分解。在真实数据集上的大量实验结果表明,该算法的数据精确率、数据纠错率都有明显提升,近似攻击率降低。

关键词: 数据隐私保护, 属性聚类, 模式分解

Abstract: Most of the existing privacy preserving techniques often ignore special relation between sensitive attribute values and quasi-identifier attributes. At the same time, data privacy preserving need make anonymous publishing to meet composite privacy constraint for various field requirements. This paper proposes an efficient cluster algorithm based on model division for large data privacy preserving, by analyzing composite privacy constraint and similar sensitive attribute values. Firstly, it presents the clustering of sensitive attribute values to protect similar ones, and sets different weight to retain important quasi-identifier attributes. Secondly, the utility matrix of three-dimensional irregular matrix is used to obtain anonymous data with high accuracy and achieve the mode decomposition of anonymous data. Finally, experimental results on real data sets show that the data accurate rate and data error correction rate of the proposed algorithm obviously increase, and the approximate attack rate decreases.

Key words: data privacy preserving, attributes clustering, model division