Research on Differentially Private Trajectory Data Publishing
-
摘要:
位置轨迹大数据的安全分享、发布需求离不开位置轨迹隐私保护技术支持。在差分隐私出现之前,K-匿名及其衍生模型为位置轨迹隐私保护提供了一种量化评估的手段,但其安全性严重依赖于攻击者所掌握的背景知识,当有新的攻击出现时模型无法提供完善的隐私保护。差分隐私技术的出现有效地弥补了上述问题,越来越多地应用于轨迹数据隐私发布领域中。该文对基于差分隐私理论的轨迹隐私保护技术进行了研究与分析,重点介绍了差分隐私模型下位置直方图、轨迹直方图等空间统计数据发布方法,差分隐私模型下轨迹数据集发布方法,以及连续轨迹实时发布隐私保护模型。与此同时,在对现有方法对比分析的基础上,提出了未来的重点发展方向。
Abstract:Securely sharing and publishing location trajectory data relies on support of location privacy protection technology. Prior to the advent of differential privacy, K-anonymity and its derived models provide a means of quantitative assessment of location-trajectory privacy protection. However, its security relies heavily on the background knowledge of the attacker, and the model can not provide perfect privacy protection when a new attack occurs. Differential privacy effectively compensates for the above problems, and it proves the level of privacy protection based on rigorous mathematical theory and is increasingly used in the field of trajectory data privacy publishing. Therefore, the trajectory privacy protection technology based on differential privacy theory is studied and analyzed, and the methods of spatial statistical data publishing are introduced such as location histogram and trajectory histogram, the method of trajectory data set publishing and the model of continuous real-time location release privacy protection. At the same time, the existing methods are compared and analyzed, the key development directions are put forward in the future.
-
Key words:
- Privacy preserving /
- Differential privacy /
- Location big data /
- Trajectory big data /
- Data publishing
-
表 1 轨迹序列数据集
序号 路径 1 L1→L2→L3 2 L1→L2 3 L3→L2→L1 4 L1→L2→L4 5 L1→L2→L3 6 L3→L2 7 L1→L2→L4→L1 8 L3→L1 表 2 基于差分隐私保护的轨迹信息发布方法比较
类型 数据发布方法 隐私保护模型 隐私保护机制 发布数据类型 时间复杂度分析 空间
直方图四叉索引树,KD-索引树,K叉平均树,PrivTree $ \left( {\varepsilon ,\delta } \right) - {\rm{DP}}$ Laplace机制 位置直方图 噪音的全局敏感度和树高相关,构建四叉树结构的复杂度为O(n·lgn) EH-DP, privSH ε-DP ε-DP Laplace机制和指数机制 轨迹直方图 构建EH数据结构的复杂度为O (mn) 位置熵(LE) $ \left( {\varepsilon ,\delta } \right) - {\rm{DP}}$ Laplace机制 位置熵直方图 用到了平滑敏感度替代全局敏感度,添加噪音较少,计算所有地理位置的熵需要遍历用户的地理位置,复杂度为O(mn) 轨迹
集合
发布前缀树,n-gram, DPT, PST ε-DP Laplace机制 移动轨迹数据集 构建前缀树的复杂度为O(mn·lgm) K-means聚类,OPT K-means聚类 ε-DP Laplace机制和指数机制 移动轨迹数据集 若聚类为k,迭代次数平均为t,由于采用差分隐私指数机制,复杂度为O (mktnk) 连续
轨迹
发布δ-位置集合,(δ, r)-位置集合 ε-DP Laplace机制 单个轨迹点 假设攻击者知道用户的移动模式,使噪音添加的更合理。时间复杂为O(m|$\Delta $X|) (其中|$\Delta $X|为构建位置集合大小) Wasserstein机制,MQM机制,FGS-PufferFish机制 ε-PufferFish隐私 Laplace机制 单条轨迹 时间复杂度为O (nkm2·lgm) (其中k为PufferFish定义的空间关联的最大区域) -
BAO Jie, HE Tianfu, RUAN Sijie, et al. Planning bike lanes based on sharing-bikes’ trajectories[C]. The 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Halifax, Canada, 2017: 1377–1386. YUAN Jing, ZHENG Yu, ZHANG Chengyang, et al. T-drive: Driving directions based on taxi trajectories[C]. The 18th SIGSPATIAL International Conference on Advances in Geographic Information Systems, San Jose, USA, 2010: 99–108. TOCKAR A. Riding with the stars: Passenger privacy in the NYC Taxicab dataset[EB/OL]. https://research.neustar.biz/author/atockar/, 2018. W-Pwn. 健身APP泄露军事机密, 包括中国南海[EB/OL]. https://zhuanlan.zhihu.com/p/33405626, 2018. CHEN Zhenyu, FU Yanyan, ZHANG Min, et al. The de-anonymization method based on user spatio-temporal mobility trace[C]. The 19th International Conference on Information and Communications Security, Beijing, China, 2017: 459–471. ABUL O, BONCHI F, and NANNI M. Never walk alone: Uncertainty for anonymity in moving objects databases[C]. The 24th IEEE International Conference on Data Engineering, Cancun, Mexico, 2008: 376–385. TERROVITIS M, POULIS G, MAMOULIS N, et al. Local suppression and splitting techniques for privacy preserving publication of trajectories[J]. IEEE Transactions on Knowledge and Data Engineering, 2017, 29(7): 1466–1479. doi: 10.1109/TKDE.2017.2675420 YAO Lin, WANG Xinyu, WANG Xin, et al. Publishing sensitive trajectory data under enhanced l-diversity model[C]. The 20th IEEE International Conference on Mobile Data Management, Hong Kong, China, 2019: 160–169. 冯登国. 大数据安全与隐私保护[M]. 北京: 清华大学出版社, 2018: 194–219. DWORK C. Differential privacy[C]. The 33rd International Colloquium on Automata, Languages and Programming, Venice, Italy, 2006: 1–12. doi: 10.1007/11787006_1. ZHU Tianqing, LI Gang, ZHOU Wanlei, et al. Differentially private data publishing and analysis: A survey[J]. IEEE Transactions on Knowledge and Data Engineering, 2017, 29(8): 1619–1638. doi: 10.1109/TKDE.2017.2697856 MCSHERRY F and TALWAR K. Mechanism design via differential privacy[C]. The 48th Annual IEEE Symposium on Foundations of Computer Science, Providence, USA, 2007: 94–103. MCSHERRY F D. Privacy integrated queries: An extensible platform for privacy-preserving data analysis[C]. The 2009 ACM SIGMOD International Conference on Management of Data, Rhode Island, USA, 2009: 19–30. DWORK C, KENTHAPADI K, MCSHERRY F, et al. Our data, ourselves: Privacy via distributed noise generation[C]. The 24th Annual International Conference on the Theory and Applications of Cryptographic Techniques, Peterbury, Russia, 2006: 486–503. KIFER D and MACHANAVAJJHALA A. No free lunch in data privacy[C]. The 2011 ACM SIGMOD International Conference on Management of Data, Athens, Greece, 2011: 193–204. GEHRKE J, LUI E, and PASS R. Towards privacy for social networks: A zero-knowledge based definition of privacy[C]. The 8th Conference on Theory of Cryptography, Providence, USA, 2011: 432–449. KIFER D and MACHANAVAJJHALA A. Pufferfish: A framework for mathematical privacy definitions[J]. ACM Transactions on Database Systems, 2014, 39(1): Article No.3. doi: 10.1145/2514689 DWORK C. A firm foundation for private data analysis[J]. Communications of the ACM, 2011, 54(1): 86–95. doi: 10.1145/1866739.1866758 DWORK C, MCSHERRY F, NISSIM K, et al. Calibrating noise to sensitivity in private data analysis[C]. The 3rd Theory of Cryptography Conference, New York, USA, 2006: 265–284. QARDAJI W, YANG Weining, and LI Ninghui. Differentially private grids for geospatial data[C]. The 29th IEEE International Conference on Data Engineering, Brisbane, Australia, 2013: 757–768. CORMODE G, PROCOPIUC C, SRIVASTAVA D, et al. Differentially private spatial decompositions[C]. The 28th IEEE International Conference on Data Engineering, Washington, USA, 2012: 20–31. ZHANG Jun, XIAO Xiaokui, and XIE Xing. PrivTree: A differentially private algorithm for hierarchical decompositions[C]. The 2016 International Conference on Management of Data, San Francisco, USA, 2016: 155–170. HAY M, RASTOGI V, MIKLAU G, et al. Boosting the accuracy of differentially private histograms through consistency[J]. Proceedings of the VLDB Endowment, 2010, 3(1/2): 1021–1032. XIAO Yonghui, XIONG Li, and YUAN Chun. Differentially private data release through multidimensional partitioning[C]. The 7th Workshop on Secure Data Management, Singapore, 2010: 150–168. XIE Hairuo, TANIN E, and KULIK L. Distributed histograms for processing aggregate data from moving objects[C]. 2007 International Conference on Mobile Data Management, Mannheim, Germany, 2007: 152–157. XIE Hairuo, TANIN E, KULIK L, et al. Euler histogram tree: A spatial data structure for aggregate range queries on vehicle trajectories[C]. The 7th ACM SIGSPATIAL International Workshop on Computational Transportation Science, Dallas/Fort Worth, USA, 2014: 18–24. GHANE S, KULIK L, and RAMAMOHANARAO K. Publishing spatial histograms under differential privacy[C]. The 30th International Conference on Scientific and Statistical Database Management, Bozen-Bolzano, Italy, 2018: Article No.14. HARDT M, LIGETT K, and MCSHERRY F. A simple and practical algorithm for differentially private data release[C]. Advances in Neural Information Processing Systems, Lake Tahoe, USA, 2012: 2339–2347. TO H, NGUYEN K, and SHAHABI C. Differentially private publication of location entropy[C]. The 24th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, Burlingame, USA, 2016: Article No. 35. CHEN Rui, FUNG B C M, DESAI B C, et al. Differentially private transit data publication: A case study on the montreal transportation system[C]. The 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Beijing, China, 2012: 213–221. CHEN Rui, ACS G, and CASTELLUCCIA C. Differentially private sequential data publication via variable-length n-grams[C]. The 2012 ACM Conference on Computer and Communications Security, Raleigh, USA, 2012: 638–649. HE Xi, CORMODE G, MACHANAVAJJHALA A, et al. DPT: Differentially private trajectory synthesis using hierarchical reference systems[J]. The VLDB Endowment, 2015, 8(11): 1154–1165. doi: 10.14778/2809974.2809978 HE Xi, RAVAL N, and MACHANAVAJJHALA A. A demonstration of VisDPT: Visual exploration of differentially private trajectories[J]. The VLDB Endowment, 2016, 9(13): 1489–1492. doi: 10.14778/3007263.3007291 HUA Jingyu, GAO Yue, and ZHONG Sheng. Differentially private publication of general time-serial trajectory data[C]. 2015 IEEE Conference on Computer Communications, Hong Kong, China, 2015: 549–557. doi: 10.1109/INFOCOM.2015.7218422. LI Meng, ZHU Liehuang, ZHANG ZIjian, et al. Achieving differential privacy of trajectory data publishing in participatory sensing[J]. Information Sciences, 2017, 400/401: 1–13. doi: 10.1016/j.ins.2017.03.015 ERLINGSSON Ú, PIHUR V, and KOROLOVA A. Rappor: Randomized aggregatable privacy-preserving ordinal response[C]. The 2014 ACM SIGSAC Conference on Computer and Communications Security, Scottsdale, USA, 2014: 1054–1067. BASSILY R and SMITH A. Local, private, efficient protocols for succinct histograms[C]. The 47th Annual ACM Symposium on Theory of Computing, Portland, USA, 2015: 127–135. XIAO Yonghui and XIONG Li. Protecting locations with differential privacy under temporal correlations[C]. The 22nd ACM SIGSAC Conference on Computer and Communications Security, Denver, USA, 2015: 1298–1309. WANG Shuo, SINNOTT R, and NEPAL S. Protecting the location privacy of mobile social media users[C]. IEEE International Conference on Big Data, Washington, USA, 2016: 1143–1150. HARDT M and TALWAR K. On the geometry of differential privacy[C]. The 42nd ACM Symposium on Theory of Computing, Cambridge, UK, 2010: 705–714. SONG Shuang, WANG Yizhen, and CHAUDHURI K. Pufferfish privacy mechanisms for correlated data[C]. The 2017 ACM International Conference on Management of Data, Chicago, USA, 2017: 1291–1306. OU Lu, QIN Zheng, LIAO Shaolin, et al. An optimal pufferfish privacy mechanism for temporally correlated trajectories[J]. IEEE Access, 2018, 6: 37150–37165. doi: 10.1109/ACCESS.2018.2847720 DWORK C and ROTH A. The algorithmic foundations of differential privacy[J]. Foundations and Trends® in Theoretical Computer Science, 2014, 9(3/4): 211–407. doi: 10.1561/0400000042 KELLARIS G, PAPADOPOULOS S, XIAO Xiaokui, et al. Differentially private event sequences over infinite streams[J]. Proceedings of the VLDB Endowment, 2014, 7(12): 1155–1166. doi: 10.14778/2732977.2732989 ABADI M, CHU A, GOODFELLOW I, et al. Deep learning with differential privacy[C]. The 2016 ACM SIGSAC Conference on Computer and Communications Security, Vienna, Austria, 2016: 308–318. GAO Qiang, ZHOU Fan, ZHANG Kunpeng, et al. Identifying human mobility via trajectory embeddings[C]. The 26th International Joint Conference on Artificial Intelligence, Macao, China, 2017: 1689–1695. WANG Guowei, LIAO Dongliang, and LI Jing. Complete user mobility via user and trajectory embeddings[J]. IEEE Access, 2018, 6: 72125–72136. doi: 10.1109/ACCESS.2018.2881457 ZHOU Fan, GAO Qiang, TRAJCEVSKI G, et al. Trajectory-user linking via variational autoencoder[C]. The 27th International Joint Conference on Artificial Intelligence, Stockholm, Sweden, 2018: 3212–3218. LI Xiucheng, ZHAO Kaiqi, CONG Gao, et al. Deep representation learning for trajectory similarity computation[C]. The 34th IEEE International Conference on Data Engineering, Paris, France, 2018: 617–628. JORGENSEN Z, YU Ting, and CORMODE G. Conservative or liberal? Personalized differential privacy[C]. The 31st IEEE International Conference on Data Engineering, Seoul, South Korea, 2015: 1023–1034. LI Haoran, XIONG Li, JI Zhanglong, et al. Partitioning-based mechanisms under personalized differential privacy[C]. The 21st Pacific-Asia Conference on Knowledge Discovery and Data Mining, Jeju, South Korea, 2017: 615–627. YE Yutong, ZHANG Min, FENG Dengguo, et al. Multiple privacy regimes mechanism for local differential privacy[C]. The 24th International Conference on Database Systems for Advanced Applications, Chiang Mai, Thailand, 2019: 247–263. CHEN Rui, LI Haoran, QIN A K, et al. Private spatial data aggregation in the local setting[C]. The 32nd IEEE International Conference on Data Engineering, Helsinki, Finland, 2016: 289–300.