|
满足本地差分隐私的动态域键值数据聚合机制
|
Abstract:
本地差分隐私具有不需要可信第三方、交互少、运行效率高等优点,近年来受到了广泛关注,而键值数据在实际生活里的各个软件中也得到了广泛的应用,键值数据的统计分析也是研究的重点。然而,现有本地差分隐私估计机制未能考虑实际应用中数据域动态变化的情况,将所有数据同等对待,这会浪费大量带宽,且导致估计结果准确度偏低的情况。针对这一问题,提出了键域独立的方法和机制,考虑到数据域动态新增和删减等情况,对不同数据按实时键域情况分批处理。理论分析证实,相对于现有的本地差分隐私机制,所提方案能够对数据域随时变化的键值数据实现基本相同的保护效果,并且在通信成本和通信效率上都有一定的优势。最后,在数据集上评估了新的方案,实验结果证明了所提的机制能够有效应对数据域随时发生变化的情况,相较于朴素地将所有数据一视同仁的情况,能够有效降低估计误差,提升数据效用。
Local differential privacy boasts several merits, including the absence of a trusted third party, reduced interaction, and high operational efficiency. It has garnered extensive attention in recent years. Moreover, key-value data has found wide application in real life, and the statistical analysis of key-value data constitutes a key research focus. Nevertheless, existing local differential privacy estimation mechanisms fail to take into account the situation where the data domain undergoes changes at any time in practical applications. They treat all data uniformly, which not only wastes a considerable amount of bandwidth but also leads to relatively low accuracy of estimation results. In response to this issue, a method and mechanism of key-domain independence are put forward, considering scenarios such as real-time additions and deletions of the data domain and processing different data in batches based on the real-time key-domain situation. Theoretical analysis confirms that, in contrast to the existing local differential privacy mechanisms, the proposed scheme can achieve precisely the same protective effect for key-value data with a changing data domain at any time, and has certain advantages in communication cost and communication efficiency. Finally, the new scheme was evaluated on a dataset, and the experimental results show that the proposed mechanism can effectively address the situation where the data domain changes in real time, and can effectively reduce the estimation error compared with the naive approach of treating all data equally, thereby improving data utility.
[1] | Dwork, C. (2008) Differential Privacy: A Survey of Results. In: Lecture Notes in Computer Science, Springer, 1-19. https://doi.org/10.1007/978-3-540-79228-4_1 |
[2] | Dwork, C., McSherry, F., Nissim, K. and Smith, A. (2006) Calibrating Noise to Sensitivity in Private Data Analysis. In: Lecture Notes in Computer Science, Springer, 265-284. https://doi.org/10.1007/11681878_14 |
[3] | Duchi, J.C., Jordan, M.I. and Wainwright, M.J. (2013) Local Privacy and Statistical Minimax Rates. 2013 IEEE 54th Annual Symposium on Foundations of Computer Science, Berkeley, 26-29 October 2013, 429-438. https://doi.org/10.1109/focs.2013.53 |
[4] | Erlingsson, Ú., Pihur, V. and Korolova, A. (2014) RAPPOR. Proceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security, New York, 1054-1067, 3-7 November 2014. https://doi.org/10.1145/2660267.2660348 |
[5] | Ye, Q., Hu, H., Meng, X., Zheng, H., Huang, K., Fang, C., et al. (2023) Privkvm: Revisiting Key-Value Statistics Estimation with Local Differential Privacy. IEEE Transactions on Dependable and Secure Computing, 20, 17-35. https://doi.org/10.1109/tdsc.2021.3107512 |
[6] | Gu, X.L., Li, M., Cheng, Y.Q., et al. (2020) PCKV: Locally Differentially Private Correlated Key-Value Data Collection with Optimized Utility. 29th USENIX Security Symposium (USENIX Security 20), Boston, 12-14 August 2020, 967-984. |
[7] | Sun, L., Zhao, J., Ye, X.J., et al. (2019) Conditional Analysis for Key-Value Data with Local Differential Privacy. |
[8] | 魏立斐, 陈聪聪, 张蕾, 等. 机器学习的安全问题及隐私保护[J].计算机研究与发展, 2020, 57(10): 2066-2085. |
[9] | 郭娟娟, 王琼霄, 许新, 等. 安全多方计算及其在机器学习中的应用[J]. 计算机研究与发展, 2021, 58(10): 2161-2186. |
[10] | Wang, S., Qian, Y., Du, J., Yang, W., Huang, L. and Xu, H. (2020) Set-Valued Data Publication with Local Privacy. Proceedings of the VLDB Endowment, 13, 1234-1247. https://doi.org/10.14778/3389133.33891407 |