|
- 2015
利用贝叶斯原理在隐私保护数据上进行分类的方法
|
Abstract:
针对可还原数据扰动(retrievable general additive data perturbation, RGADP)算法在保护数据库隐私时会影响数据挖掘结果的问题,提出一种利用贝叶斯原理在扰动数据上进行分类的方法。该方法分析RGADP算法过程,利用贝叶斯原理,根据扰动数据推算原始数据的概率分布,用估算的概率分布重构数据,并对重构数据进行分类以提高分类的正确性。实验结果表明:该方法估算出的概率分布与原始数据概率分布接近,且重构数据的分类正确率相比扰动数据而言平均可提高4%以上,其更接近原始数据的分类正确率,从而有效地降低了扰动算法对数据分类的影响;该方法的运行时间与数据量和数据分组数成正比,重构10 000条数据的运行时间在200 ms以内,因此该方法也具有较高的效率。
A classification method for perturbed data using the Bayesian rule is presented to solve the problem that the result of data mining is affected when the retrievable general additive data perturbation (RGADP) algorithm is used to preserve privacy in database. The process of RGADP algorithm is analyzed, and the Bayesian rule is used to estimate the probability distribution of original data from the perturbed data. Then, new data are reconstructed from the estimated probability distribution and are classified to increase the accuracy of classification. Experimental results show that the probability distribution estimated by the proposed method is close to the original probability distribution. Comparison with the classification accuracy of perturbed data shows that the classification accuracy of the reconstructed data increases by more than 4% in average, and is closer to the original classification accuracy. Thus, the method can effectively reduce the effect of the perturbation algorithm on classification. Moreover, the running time of the method is proportional to the amount of data and the number of groups. The method costs less than 200 ms to reconstruct 10 thousands data, and has a high efficiency
[1] | ZHANG Peng, TANG Shiwei. Privacy preserving naive Bayes classification [J]. Chinese Journal of Computers, 2007, 30(8): 1267??1276. |
[2] | [10]MURALIDHAR K, SARATHY R. An enhanced data perturbation approach for small data sets [J]. Decision Sciences, 2005, 36(3): 513??529. |
[3] | [11]YANG Pan, GUI Xiaolin, AN Jian, et al. A retrievable data perturbation method used in privacy??preserving in cloud computing [J]. China Communications, 2014, 11(8): 73??84. |
[4] | [12]GE Weiping, WANG Wei, LI Xiaorong, et al. A privacy??preserving classification mining algorithm [C]∥Proceedings of the 9th Pacific??Asia Conference on Advances in Knowledge Discovery and Data Mining. Berlin, Germany: Springer, 2005: 256??261. |
[5] | [13]LI Guang, WANG Yadong. A privacy??preserving classification method based on singular value decomposition [J]. International Arab Journal of Information Technology, 2012, 9(6): 529??534. |
[6] | [14]AGRAWAL R, SRIKANT R. Privacy??preserving data mining [J]. ACM Sigmod Record, 2000, 29(2): 439??450. |
[7] | [15]LI Deyi, LIU Changyu, GAN Wenyan. A new cognitive model: cloud model [J]. International Journal of Intelligent Systems, 2009, 24(3): 357??375. |
[8] | [16]KOHAVI R, BECKER B. UCI machine learning repository: adult data set [DB/OL]. (1996??05??01) [2014??10??01]. http:∥archive.ics.uci.edu/ml/datasets/Adult. |
[9] | [1]周水庚, 李丰, 陶宇飞, 等. 面向数据库应用的隐私保护研究综述 [J]. 计算机学报, 2009, 32(5): 847??861. |
[10] | ZHOU Shuigeng, LI Feng, TAO Yufei, et al. Privacy preservation in database applications: a survey [J]. Chinese Journal of Computers, 2009, 32(5): 847??861. |
[11] | [2]SWEENEY L. K??anonymity: a model for protecting privacy [J]. International Journal of Uncertainty, Fuzziness and Knowledge??Based Systems, 2002, 10(5): 557??570. |
[12] | [3]WANG S L, TSAI Z Z, TING I H, et al. K??anonymous path privacy on social graphs [J]. Journal of Intelligent and Fuzzy Systems, 2014, 26(3): 1191??1199. |
[13] | [4]LI Jin, WANG Qian, WANG Cong, et al. Fuzzy keyword search over encrypted data in cloud computing [C]∥Proceedings of the 2010 IEEE International Conference on Computer Communications. Piscataway, NJ, USA: IEEE, 2010: 1??5. |
[14] | [5]KANTARCIOGLU M, CLIFTON C. Privacy??preserving distributed mining of association rules on horizontally partitioned data [J]. IEEE Transactions on Knowledge and Data Engineering, 2004, 16(9): 1026??1037. |
[15] | [6]VAIDYA J, CLIFTON C. Privacy preserving k??means clustering over vertically partitioned data [C]∥Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York, USA: ACM, 2003: 206??215. |
[16] | [7]张鹏, 唐世渭. 朴素贝叶斯分类中的隐私保护方法研究 [J]. 计算机学报, 2007, 30(8): 1267??1276. |
[17] | [8]BAGHEL R, DUTTA M. Privacy preserving classification by using modified C4??5 [C]∥Proceedings of the IEEE International Conference on Contemporary Computing. Piscataway, NJ, USA: IEEE, 2013: 124??129. |
[18] | [9]MURALIDHAR K, PARSA R, SARATHY R. A general additive data perturbation method for database security [J]. Management Science, 1999, 45(10): 1399??1415. |