计算机科学与探索 ›› 2010, Vol. 4 ›› Issue (11): 1010-1018.DOI: 10.3778/j.issn.1673-9418.2010.11.006

• 学术研究 • 上一篇    下一篇

k-匿名模型中准标识符最佳值的求解问题*

王丹丽1+, 刘国华1,2,3, 宋金玲1,4, 李芳玲5   

  1. 1. 燕山大学 信息科学与工程学院 计算机科学与工程系, 河北 秦皇岛 066004
    2. 东华大学 计算机科学与技术学院, 上海 201620
    3. 南京大学 计算机软件新技术国家重点实验室, 南京 210093
    4. 河北科技师范学院 计算机系, 河北 秦皇岛 066004
    5. 山东理工职业学院 信息工程系, 山东 济宁 272017
  • 收稿日期:1900-01-01 修回日期:1900-01-01 出版日期:2010-11-01 发布日期:2010-11-01
  • 通讯作者: 王丹丽

Problem of Finding the Optimal Value on Quasi-Identifier for k-Anonymity Model*

WANG Danli1+, LIU Guohua1,2,3, SONG Jinling1,4, LI Fangling5   

  1. 1. Department of Computer Science and Engineering, School of Information Science and Engineering, Yanshan University, Qinhuangdao, Hebei 066004, China
    2. College of Computer Science and Technology, Donghua University, Shanghai 201620, China
    3. State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing 210093, China
    4. Department of Computer, Hebei Normal University of Science & Technology, Qinhuangdao, Hebei 066004, China
    5. Department of Information Technology, Shandong Polytechnic Vocational College, Jining, Shandong 272017, China
  • Received:1900-01-01 Revised:1900-01-01 Online:2010-11-01 Published:2010-11-01
  • Contact: WANG Danli

摘要: 准标识符值是影响k-匿名表隐私保护程度和数据质量的关键因素。如何在给定各个准标识符属性泛化树的情况下求解准标识符最佳值, 对匿名表在满足隐私保护要求的同时达到最高的数据质量具有重要意义。针对这一问题, 证明了准标识符最佳值的求解问题是NP-完全问题, 提出了准标识符最佳值的近似求解方法, 并给出了准标识符最佳值的近似求解算法; 最后, 对算法进行了正确性证明和时间复杂度分析。

关键词: k-匿名, 数据质量, 泛化树, 准标识符最佳值, NP-完全

Abstract: The value on quasi-identifier is a key factor to impact the degree of privacy protection and data quality of k-anonymous tables. After generalization trees of quasi-identifier attributes have been generated, how to find the optimal value on quasi-identifier is very important for anonymous table to meet the privacy protection requirements and achieve the highest data quality. To solve this, firstly, the problem of finding the optimal value on quasi- identifier is proved to be a NP-complete problem. Secondly, the approximate method of finding the optimal value on quasi-identifier is presented, and the approximate algorithm for finding the optimal value on quasi-identifier is pro-posed. Lastly, the correctness of the algorithm is proved and the time complexity of the algorithm is analyzed.

Key words: k-anonymity, data quality, generalization tree, optimal value on quasi-identifier, NP-complete

中图分类号: