基于差分隐私模型的位置轨迹发布技术研究

冯登国; 张敏; 叶宇桐

doi:10.11999/JEIT190632

基于差分隐私模型的位置轨迹发布技术研究

doi: 10.11999/JEIT190632 cstr: 32379.14.JEIT190632

冯登国^{1, 2, ,},
张敏^{1, 2},
叶宇桐¹

1.
中国科学院软件研究所可信计算与保障实验室北京 100190
2.
中国科学院软件研究所计算机科学国家重点实验室北京 100190

基金项目: 国家自然科学基金(U1636216)

详细信息

作者简介:
冯登国：男，1965年生，中国科学院院士，研究员，研究方向为网络与信息安全

张敏：女，1975年生，研究员，研究方向为数据安全与隐私保护

叶宇桐：男，1993年生，博士生，研究方向为差分隐私保护技术

通讯作者:
冯登国　feng@is.iscas.ac.cn

¹⁾ 在表2中的时间复杂度分析中，n表示轨迹或位置数据集合的样本数目，m表示轨迹平均长度。
中图分类号: TN918
计量
- 文章访问数: 6997
- HTML全文浏览量: 2329
- PDF下载量: 333
- 被引次数: 0
出版历程
- 收稿日期: 2019-08-26
- 修回日期: 2019-11-30
- 网络出版日期: 2019-12-05
- 刊出日期: 2020-01-21

Research on Differentially Private Trajectory Data Publishing

Dengguo FENG^{1, 2
, ,},
Min ZHANG^{1, 2},
Yutong YE¹

1.
Trusted Computing and Information Assurance Laboratory, Institute of Software, Chinese Academy of Sciences, Beijing 100190, China
2.
State Key Laboratory of Computer Science, Institute of Software, Chinese Academy of Sciences, Beijing 100190, China

Funds: The National Natural Science Foundation of China (U1636216)

摘要

摘要:
位置轨迹大数据的安全分享、发布需求离不开位置轨迹隐私保护技术支持。在差分隐私出现之前，K-匿名及其衍生模型为位置轨迹隐私保护提供了一种量化评估的手段，但其安全性严重依赖于攻击者所掌握的背景知识，当有新的攻击出现时模型无法提供完善的隐私保护。差分隐私技术的出现有效地弥补了上述问题，越来越多地应用于轨迹数据隐私发布领域中。该文对基于差分隐私理论的轨迹隐私保护技术进行了研究与分析，重点介绍了差分隐私模型下位置直方图、轨迹直方图等空间统计数据发布方法，差分隐私模型下轨迹数据集发布方法，以及连续轨迹实时发布隐私保护模型。与此同时，在对现有方法对比分析的基础上，提出了未来的重点发展方向。
- 隐私保护 /
- 差分隐私 /
- 位置大数据 /
- 轨迹大数据 /
- 数据发布
Abstract:
Securely sharing and publishing location trajectory data relies on support of location privacy protection technology. Prior to the advent of differential privacy, K-anonymity and its derived models provide a means of quantitative assessment of location-trajectory privacy protection. However, its security relies heavily on the background knowledge of the attacker, and the model can not provide perfect privacy protection when a new attack occurs. Differential privacy effectively compensates for the above problems, and it proves the level of privacy protection based on rigorous mathematical theory and is increasingly used in the field of trajectory data privacy publishing. Therefore, the trajectory privacy protection technology based on differential privacy theory is studied and analyzed, and the methods of spatial statistical data publishing are introduced such as location histogram and trajectory histogram, the method of trajectory data set publishing and the model of continuous real-time location release privacy protection. At the same time, the existing methods are compared and analyzed, the key development directions are put forward in the future.
- Privacy preserving /
- Differential privacy /
- Location big data /
- Trajectory big data /
- Data publishing
¹⁾ 在表2中的时间复杂度分析中，n表示轨迹或位置数据集合的样本数目，m表示轨迹平均长度。

HTML全文

图 1 基于四叉树的差分隐私位置计数查询方法

下载: 全尺寸图片幻灯片

图 2 EH模型功能示意图

下载: 全尺寸图片幻灯片

图 3 轨迹序列的前缀树表示

下载: 全尺寸图片幻灯片

图 4 基于N-gram的搜索树方法

下载: 全尺寸图片幻灯片

图 5 轨迹片段在RS2前缀树中的表达示例

下载: 全尺寸图片幻灯片

图 6 轨迹聚类方法过程示意图

下载: 全尺寸图片幻灯片

表 1 轨迹序列数据集

序号	路径
1	L1→L2→L3
2	L1→L2
3	L3→L2→L1
4	L1→L2→L4
5	L1→L2→L3
6	L3→L2
7	L1→L2→L4→L1
8	L3→L1

下载: 导出CSV

表 2 基于差分隐私保护的轨迹信息发布方法比较

类型	数据发布方法	隐私保护模型	隐私保护机制	发布数据类型	时间复杂度分析
空间直方图	四叉索引树，KD-索引树，K叉平均树，PrivTree	$ \left( {\varepsilon ,\delta } \right) - {\rm{DP}}$	Laplace机制	位置直方图	噪音的全局敏感度和树高相关，构建四叉树结构的复杂度为O(n·lgn)
	EH-DP, privSH	ε-DP ε-DP	Laplace机制和指数机制	轨迹直方图	构建EH数据结构的复杂度为O (mn)
	位置熵(LE)	$ \left( {\varepsilon ,\delta } \right) - {\rm{DP}}$	Laplace机制	位置熵直方图	用到了平滑敏感度替代全局敏感度，添加噪音较少，计算所有地理位置的熵需要遍历用户的地理位置，复杂度为O(mn)
轨迹集合发布	前缀树，n-gram, DPT, PST	ε-DP	Laplace机制	移动轨迹数据集	构建前缀树的复杂度为O(mn·lgm)
轨迹集合发布	K-means聚类，OPT K-means聚类	ε-DP	Laplace机制和指数机制	移动轨迹数据集	若聚类为k，迭代次数平均为t，由于采用差分隐私指数机制，复杂度为O (mktn^k)
连续轨迹发布	δ-位置集合，(δ, r)-位置集合	ε-DP	Laplace机制	单个轨迹点	假设攻击者知道用户的移动模式，使噪音添加的更合理。时间复杂为O(m\|$\Delta $X\|) (其中\|$\Delta $X\|为构建位置集合大小)
连续轨迹发布	Wasserstein机制，MQM机制，FGS-PufferFish机制	ε-PufferFish隐私	Laplace机制	单条轨迹	时间复杂度为O (nkm²·lgm) (其中k为PufferFish定义的空间关联的最大区域)

下载: 导出CSV

参考文献(53)

BAO Jie, HE Tianfu, RUAN Sijie, et al. Planning bike lanes based on sharing-bikes’ trajectories[C]. The 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Halifax, Canada, 2017: 1377–1386.

YUAN Jing, ZHENG Yu, ZHANG Chengyang, et al. T-drive: Driving directions based on taxi trajectories[C]. The 18th SIGSPATIAL International Conference on Advances in Geographic Information Systems, San Jose, USA, 2010: 99–108.

TOCKAR A. Riding with the stars: Passenger privacy in the NYC Taxicab dataset[EB/OL]. https://research.neustar.biz/author/atockar/, 2018.

W-Pwn. 健身APP泄露军事机密, 包括中国南海[EB/OL]. https://zhuanlan.zhihu.com/p/33405626, 2018.

CHEN Zhenyu, FU Yanyan, ZHANG Min, et al. The de-anonymization method based on user spatio-temporal mobility trace[C]. The 19th International Conference on Information and Communications Security, Beijing, China, 2017: 459–471.

ABUL O, BONCHI F, and NANNI M. Never walk alone: Uncertainty for anonymity in moving objects databases[C]. The 24th IEEE International Conference on Data Engineering, Cancun, Mexico, 2008: 376–385.

TERROVITIS M, POULIS G, MAMOULIS N, et al. Local suppression and splitting techniques for privacy preserving publication of trajectories[J]. IEEE Transactions on Knowledge and Data Engineering, 2017, 29(7): 1466–1479. doi: 10.1109/TKDE.2017.2675420

YAO Lin, WANG Xinyu, WANG Xin, et al. Publishing sensitive trajectory data under enhanced l-diversity model[C]. The 20th IEEE International Conference on Mobile Data Management, Hong Kong, China, 2019: 160–169.

冯登国. 大数据安全与隐私保护[M]. 北京: 清华大学出版社, 2018: 194–219.

DWORK C. Differential privacy[C]. The 33rd International Colloquium on Automata, Languages and Programming, Venice, Italy, 2006: 1–12. doi: 10.1007/11787006_1.

ZHU Tianqing, LI Gang, ZHOU Wanlei, et al. Differentially private data publishing and analysis: A survey[J]. IEEE Transactions on Knowledge and Data Engineering, 2017, 29(8): 1619–1638. doi: 10.1109/TKDE.2017.2697856

MCSHERRY F and TALWAR K. Mechanism design via differential privacy[C]. The 48th Annual IEEE Symposium on Foundations of Computer Science, Providence, USA, 2007: 94–103.

MCSHERRY F D. Privacy integrated queries: An extensible platform for privacy-preserving data analysis[C]. The 2009 ACM SIGMOD International Conference on Management of Data, Rhode Island, USA, 2009: 19–30.

DWORK C, KENTHAPADI K, MCSHERRY F, et al. Our data, ourselves: Privacy via distributed noise generation[C]. The 24th Annual International Conference on the Theory and Applications of Cryptographic Techniques, Peterbury, Russia, 2006: 486–503.

KIFER D and MACHANAVAJJHALA A. No free lunch in data privacy[C]. The 2011 ACM SIGMOD International Conference on Management of Data, Athens, Greece, 2011: 193–204.

GEHRKE J, LUI E, and PASS R. Towards privacy for social networks: A zero-knowledge based definition of privacy[C]. The 8th Conference on Theory of Cryptography, Providence, USA, 2011: 432–449.

KIFER D and MACHANAVAJJHALA A. Pufferfish: A framework for mathematical privacy definitions[J]. ACM Transactions on Database Systems, 2014, 39(1): Article No.3. doi: 10.1145/2514689

DWORK C. A firm foundation for private data analysis[J]. Communications of the ACM, 2011, 54(1): 86–95. doi: 10.1145/1866739.1866758

DWORK C, MCSHERRY F, NISSIM K, et al. Calibrating noise to sensitivity in private data analysis[C]. The 3rd Theory of Cryptography Conference, New York, USA, 2006: 265–284.

QARDAJI W, YANG Weining, and LI Ninghui. Differentially private grids for geospatial data[C]. The 29th IEEE International Conference on Data Engineering, Brisbane, Australia, 2013: 757–768.

CORMODE G, PROCOPIUC C, SRIVASTAVA D, et al. Differentially private spatial decompositions[C]. The 28th IEEE International Conference on Data Engineering, Washington, USA, 2012: 20–31.

ZHANG Jun, XIAO Xiaokui, and XIE Xing. PrivTree: A differentially private algorithm for hierarchical decompositions[C]. The 2016 International Conference on Management of Data, San Francisco, USA, 2016: 155–170.

HAY M, RASTOGI V, MIKLAU G, et al. Boosting the accuracy of differentially private histograms through consistency[J]. Proceedings of the VLDB Endowment, 2010, 3(1/2): 1021–1032.

XIAO Yonghui, XIONG Li, and YUAN Chun. Differentially private data release through multidimensional partitioning[C]. The 7th Workshop on Secure Data Management, Singapore, 2010: 150–168.

XIE Hairuo, TANIN E, and KULIK L. Distributed histograms for processing aggregate data from moving objects[C]. 2007 International Conference on Mobile Data Management, Mannheim, Germany, 2007: 152–157.

XIE Hairuo, TANIN E, KULIK L, et al. Euler histogram tree: A spatial data structure for aggregate range queries on vehicle trajectories[C]. The 7th ACM SIGSPATIAL International Workshop on Computational Transportation Science, Dallas/Fort Worth, USA, 2014: 18–24.

GHANE S, KULIK L, and RAMAMOHANARAO K. Publishing spatial histograms under differential privacy[C]. The 30th International Conference on Scientific and Statistical Database Management, Bozen-Bolzano, Italy, 2018: Article No.14.

HARDT M, LIGETT K, and MCSHERRY F. A simple and practical algorithm for differentially private data release[C]. Advances in Neural Information Processing Systems, Lake Tahoe, USA, 2012: 2339–2347.

TO H, NGUYEN K, and SHAHABI C. Differentially private publication of location entropy[C]. The 24th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, Burlingame, USA, 2016: Article No. 35.

CHEN Rui, FUNG B C M, DESAI B C, et al. Differentially private transit data publication: A case study on the montreal transportation system[C]. The 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Beijing, China, 2012: 213–221.

CHEN Rui, ACS G, and CASTELLUCCIA C. Differentially private sequential data publication via variable-length n-grams[C]. The 2012 ACM Conference on Computer and Communications Security, Raleigh, USA, 2012: 638–649.

HE Xi, CORMODE G, MACHANAVAJJHALA A, et al. DPT: Differentially private trajectory synthesis using hierarchical reference systems[J]. The VLDB Endowment, 2015, 8(11): 1154–1165. doi: 10.14778/2809974.2809978

HE Xi, RAVAL N, and MACHANAVAJJHALA A. A demonstration of VisDPT: Visual exploration of differentially private trajectories[J]. The VLDB Endowment, 2016, 9(13): 1489–1492. doi: 10.14778/3007263.3007291

HUA Jingyu, GAO Yue, and ZHONG Sheng. Differentially private publication of general time-serial trajectory data[C]. 2015 IEEE Conference on Computer Communications, Hong Kong, China, 2015: 549–557. doi: 10.1109/INFOCOM.2015.7218422.

LI Meng, ZHU Liehuang, ZHANG ZIjian, et al. Achieving differential privacy of trajectory data publishing in participatory sensing[J]. Information Sciences, 2017, 400/401: 1–13. doi: 10.1016/j.ins.2017.03.015

ERLINGSSON Ú, PIHUR V, and KOROLOVA A. Rappor: Randomized aggregatable privacy-preserving ordinal response[C]. The 2014 ACM SIGSAC Conference on Computer and Communications Security, Scottsdale, USA, 2014: 1054–1067.

BASSILY R and SMITH A. Local, private, efficient protocols for succinct histograms[C]. The 47th Annual ACM Symposium on Theory of Computing, Portland, USA, 2015: 127–135.

XIAO Yonghui and XIONG Li. Protecting locations with differential privacy under temporal correlations[C]. The 22nd ACM SIGSAC Conference on Computer and Communications Security, Denver, USA, 2015: 1298–1309.

WANG Shuo, SINNOTT R, and NEPAL S. Protecting the location privacy of mobile social media users[C]. IEEE International Conference on Big Data, Washington, USA, 2016: 1143–1150.

HARDT M and TALWAR K. On the geometry of differential privacy[C]. The 42nd ACM Symposium on Theory of Computing, Cambridge, UK, 2010: 705–714.

SONG Shuang, WANG Yizhen, and CHAUDHURI K. Pufferfish privacy mechanisms for correlated data[C]. The 2017 ACM International Conference on Management of Data, Chicago, USA, 2017: 1291–1306.

OU Lu, QIN Zheng, LIAO Shaolin, et al. An optimal pufferfish privacy mechanism for temporally correlated trajectories[J]. IEEE Access, 2018, 6: 37150–37165. doi: 10.1109/ACCESS.2018.2847720

DWORK C and ROTH A. The algorithmic foundations of differential privacy[J]. Foundations and Trends® in Theoretical Computer Science, 2014, 9(3/4): 211–407. doi: 10.1561/0400000042

KELLARIS G, PAPADOPOULOS S, XIAO Xiaokui, et al. Differentially private event sequences over infinite streams[J]. Proceedings of the VLDB Endowment, 2014, 7(12): 1155–1166. doi: 10.14778/2732977.2732989

ABADI M, CHU A, GOODFELLOW I, et al. Deep learning with differential privacy[C]. The 2016 ACM SIGSAC Conference on Computer and Communications Security, Vienna, Austria, 2016: 308–318.

GAO Qiang, ZHOU Fan, ZHANG Kunpeng, et al. Identifying human mobility via trajectory embeddings[C]. The 26th International Joint Conference on Artificial Intelligence, Macao, China, 2017: 1689–1695.

WANG Guowei, LIAO Dongliang, and LI Jing. Complete user mobility via user and trajectory embeddings[J]. IEEE Access, 2018, 6: 72125–72136. doi: 10.1109/ACCESS.2018.2881457

ZHOU Fan, GAO Qiang, TRAJCEVSKI G, et al. Trajectory-user linking via variational autoencoder[C]. The 27th International Joint Conference on Artificial Intelligence, Stockholm, Sweden, 2018: 3212–3218.

LI Xiucheng, ZHAO Kaiqi, CONG Gao, et al. Deep representation learning for trajectory similarity computation[C]. The 34th IEEE International Conference on Data Engineering, Paris, France, 2018: 617–628.

JORGENSEN Z, YU Ting, and CORMODE G. Conservative or liberal? Personalized differential privacy[C]. The 31st IEEE International Conference on Data Engineering, Seoul, South Korea, 2015: 1023–1034.

LI Haoran, XIONG Li, JI Zhanglong, et al. Partitioning-based mechanisms under personalized differential privacy[C]. The 21st Pacific-Asia Conference on Knowledge Discovery and Data Mining, Jeju, South Korea, 2017: 615–627.

YE Yutong, ZHANG Min, FENG Dengguo, et al. Multiple privacy regimes mechanism for local differential privacy[C]. The 24th International Conference on Database Systems for Advanced Applications, Chiang Mai, Thailand, 2019: 247–263.

CHEN Rui, LI Haoran, QIN A K, et al. Private spatial data aggregation in the local setting[C]. The 32nd IEEE International Conference on Data Engineering, Helsinki, Finland, 2016: 289–300.

施引文献

资源附件(0)

访问统计