面向中文搜索的网络加密流量侧信道分析方法

李玎; 林伟; 芦斌; 祝跃飞

doi:10.11999/JEIT210289

面向中文搜索的网络加密流量侧信道分析方法

doi: 10.11999/JEIT210289 cstr: 32379.14.JEIT210289

1.
战略支援部队信息工程大学郑州 450001
2.
数学工程与先进计算国家重点实验室郑州 450001

基金项目: 国家重点研发计划(2019QY1302)

详细信息

作者简介:
李玎：男，1992年生，博士生，研究方向为网络信息安全与机器学习

林伟：男，1986年生，讲师，研究方向为网络信息安全与软件分析

芦斌：男，1983年生，副教授，研究方向为网络信息安全与机器学习

祝跃飞：男，1962年生，教授，研究方向为网络信息安全与公钥密码

通讯作者:
林伟　leebeep@126.com

中图分类号: TN915; TN918
计量
- 文章访问数: 696
- HTML全文浏览量: 748
- PDF下载量: 79
- 被引次数: 0
出版历程
- 收稿日期: 2021-04-08
- 修回日期: 2021-12-15
- 录用日期: 2022-01-12
- 网络出版日期: 2022-01-21
- 刊出日期: 2022-05-25

Network Encrypted Traffic Side-channel Analysis on Chinese Search

1.
PLA Information Engineering University, Zhengzhou 450001, China
2.
State Key Laboratory of Mathematical Engineering and Advanced Computing, Zhengzhou 450001, China

Funds: The National Key R&D Program of China (2019QY1302)

摘要

摘要: 搜索引擎中的增量式搜索服务通过发送实时请求为用户更新建议列表。针对搜索加密流量存在的信息泄露，该文提出一种面向中文搜索的侧信道分析方法，利用搜索请求数据包长度增量和时间间隔的可区分性，构建了3阶段的分析模型以实现对用户输入查询的识别。实验结果表明，该方法在4个常用中文搜索引擎中的识别性能均达到理论量化值，对包含1.4×10⁵查询监控集的综合识别准确率达到76%。最后通过评估4种针对性的缓解机制，证明了通过阻断信息泄露来源可有效防御侧信道分析。
- 加密流量 /
- 侧信道分析 /
- 搜索引擎 /
- 信息增益
Abstract: Incremental search services in search engines update the suggestion list for users by sending real-time requests. Focusing on the information leakage of encrypted search traffic, a side-channel analysis method on Chinese search is proposed. Leveraging the distinguishability of packet length increments and time intervals, a three-stage analysis model is constructed to identify user queries. Experimental results show that the performance in four commonly used Chinese search engines achieves the theoretical quantified value. The identification accuracy for the set containing 1.4×10⁵ monitored queries reaches 76%. Finally, four mitigation methods are evaluated to demonstrate that side-channel analysis can be effectively defended by blocking the information leakage sources.
- Encrypted traffic /
- Side-channel analysis /
- Search engine /
- Information gain

HTML全文

图 1 AJAX增量式搜索实例

下载: 全尺寸图片幻灯片

图 2 HTTP/2头部哈夫曼编码字符串

下载: 全尺寸图片幻灯片

图 3 数据包长度增量泄露信息量化

下载: 全尺寸图片幻灯片

图 4 数据包时间间隔泄露信息量化

下载: 全尺寸图片幻灯片

图 5 查询字符串长度增量DFA模型

下载: 全尺寸图片幻灯片

图 6 击键时间间隔预测双字母对的BRNN模型

下载: 全尺寸图片幻灯片

图 7 整体性能评估

下载: 全尺寸图片幻灯片

图 8 缓解机制评估

下载: 全尺寸图片幻灯片

表 1 常见单行模式拼音输入法

操作系统	输入法	应用类型	分隔符
Windows	微软拼音	内置	撇号
Windows	QQ拼音	安装	撇号
macOS	简体拼音	内置	空格
macOS	搜狗拼音	安装	撇号
macOS	百度拼音	安装	撇号
iOS	简体拼音	内置	空格

下载: 导出CSV

表 2 URL参数对数据包长度增量可区分性的影响

协议	参数	参数变化	$d(x \in {\boldsymbol{A} })$	$d(x \in {\boldsymbol{B} })$
HTTP/1.1	无	–	1	4
	cp	产生进位	2	5
	pwd	末尾增加${\boldsymbol{A} }$	2	5
	pwd	末尾增加${\boldsymbol{B} }$	5	8
HTTP/2	无	–	0, 1	2, 3
	cp	模10大于2	0, 1	2, 3, 4
	cp	产生进位	1, 2	3, 4

下载: 导出CSV

表 3 击键探测算法

输入：未知数据包长度序列$ \{ {s_1},{s_2}, \cdots ,{s_N}\} $，DFA模型$ M $
输出：击键数据包状态序列$ {\boldsymbol{Q}} = \{ {q_1},{q_2}, \cdots ,{q_n}\} $
(1) 初始化$ N $个数据包状态序列${ {\boldsymbol{Q} }_i} = \varnothing$；
(2) for $ i $ in range($ N $) do
(3) 　DFA起始状态$ {q_i} = {q_0} $；
(4) 　for $ j $ in range($ i - 1 $) do
(5) 　　数据包长度增量$ {d_i} = {s_i} - {s_j} $；
(6) 　　if URL中含有pwd或cp参数 then
(7) 　　　根据$ {{\boldsymbol{Q}}_j} $上一接收状态$ {q_j} $调整$ {d_i} $；
(8) 　　 DFA状态转移$ {q_i} = \delta ({q_j},{d_i}) $；
(9) 　　if ${q_i} \ne \varnothing$ and $ \left\| {{{\boldsymbol{Q}}_i}} \right\| < \left\| {{{\boldsymbol{Q}}_j}} \right\| $ then
(10) 　　　更新已接收的最长前缀$ {{\boldsymbol{Q}}_i} = {{\boldsymbol{Q}}_j} $；
(11) 以第$ i $个包结尾的最长序列$ {{\boldsymbol{Q}}_i} = {{\boldsymbol{Q}}_i} + {q_i} $；
(12) return 最长接收序列$ {\boldsymbol{Q}} \in \{ {{\boldsymbol{Q}}_i}\} $。

下载: 导出CSV

表 4 多层匹配算法

输入：数据包状态序列$ {\boldsymbol{Q}} = \{ {q_1},{q_2}, \cdots ,{q_n}\} $，中文查询$ C $
输出：$ C $是否匹配状态序列$ {\boldsymbol{Q}} $
(1) 将$ C $转换为拼音字符串$ {\boldsymbol{X}} $；
(2) if 拼音字母长度$ \left\| {\boldsymbol{X}} \right\| \ne n $ then
(3) 　　return False；
(4) if 约简状态序列$ {\boldsymbol{\bar Q'}} \ne {\boldsymbol{\bar Q}} $ then
(5) 　　return False；
(6) for r in range(8) do
(7) 　　if 初始比特余数为$ r $的状态序列${ {\boldsymbol Q}'_r} = {\boldsymbol{Q} }$ then
(8) 　　　　return True；
(9) return False.

下载: 导出CSV

表 5 击键探测性能结果(%)及对比

探测方法	网站	HTTP	F₁分数	F₁=100
本文方法	百度	1.1	98.63	98.22
Monaco方法	百度	1.1	96.85	86.70
本文方法	搜狗	1.1	96.40	95.17
本文方法	360	1.1	99.51	99.33
本文方法	必应	2	99.72	99.61
Monaco方法	谷歌	2	97.26	81.12

下载: 导出CSV

表 6 查询识别准确率结果(%)及对比

分析方法	网站	样本数	监控数	准确率
本文方法	百度	1800	～1.4×10⁵	64.56
Monaco方法	百度	4000	∞	12.85
本文方法	搜狗	1800	～1.4×10⁵	61.32
本文方法	360	1800	～1.4×10⁵	65.94
本文方法	必应	1800	～1.4×10⁵	76.06
Oh等人方法	必应	100	1×10⁴	44.30
Schaub等人方法	谷歌	10	～6.8×10³	34.00
Monaco方法	谷歌	4000	∞	15.83

下载: 导出CSV

表 7 查询推断消融实验(%)

推断模型	百度	搜狗	360	必应
BRNN	64.56	61.32	65.94	76.06
RNN	64.11	60.39	65.28	75.44
HMM	61.89	58.28	62.61	73.56
MLP	61.28	57.67	61.50	72.17
Equal	57.94	54.78	59.33	69.83

下载: 导出CSV

参考文献(18)

[1]	武思齐, 王俊峰. 基于数据流多维特征的移动流量识别方法研究[J]. 四川大学学报:自然科学版, 2020, 57(2): 247–254. doi: 10.3969/j.issn.0490-6756.2020.02.008 WU Siqi and WANG Junfeng. Research on mobile traffic identification based on multidimensional characteristics of data flow[J]. Journal of Sichuan University:Natural Science Edition, 2020, 57(2): 247–254. doi: 10.3969/j.issn.0490-6756.2020.02.008
[2]	SIRINAM P, MATHEWS N, RAHMAN M S, et al. Triplet fingerprinting: More practical and portable website fingerprinting with N-shot learning[C]. The 2019 ACM SIGSAC Conference on Computer and Communications Security, London, UK, 2019: 1131–1148.
[3]	GU Jiaxi, WANG Jiliang, YU Zhiwen, et al. Traffic-based side-channel attack in video streaming[J]. IEEE/ACM Transactions on Networking, 2019, 27(3): 972–985. doi: 10.1109/TNET.2019.2906568
[4]	WHITE A M, MATTHEWS A R, SNOW K Z, et al. Phonotactic reconstruction of encrypted VoIP conversations: Hookt on fon-iks[C]. 2011 IEEE Symposium on Security and Privacy, Oakland, USA, 2011: 3–18.
[5]	LI Hong, HE Yunhua, SUN Limin, et al. Side-channel information leakage of encrypted video stream in video surveillance systems[C]. The 35th Annual IEEE International Conference on Computer Communications, San Francisco, USA, 2016: 1–9.
[6]	SONG D X, WAGNER D, and TIAN Xuqing. Timing analysis of keystrokes and timing attacks on SSH[C]. The 10th conference on USENIX Security Symposium, Washington, USA, 2001: 25.
[7]	CHEN Shuo, WANG Rui, WANG Xiaofeng, et al. Side-channel leaks in web applications: A reality today, a challenge tomorrow[C]. 2010 IEEE Symposium on Security and Privacy, Oakland, USA, 2010: 191–206.
[8]	SCHAUB A, SCHNEIDER E, HOLLENDER A, et al. Attacking suggest boxes in web applications over HTTPS using side-channel stochastic algorithms[C]. The International Conference on Risks and Security of Internet and Systems, Trento, Italy, 2014: 116–130.
[9]	OH S E, LI Shuai, and HOPPER N. Fingerprinting keywords in search queries over tor[J]. Proceedings on Privacy Enhancing Technologies, 2017, 2017(4): 251–270. doi: 10.1515/popets-2017-0048
[10]	MONACO J V. What are you searching for? a remote keylogging attack on search engine autocomplete[C]. The 28th USENIX Conference on Security Symposium, Santa Clara, USA, 2019: 959–976.
[11]	FITTS P M. The information capacity of the human motor system in controlling the amplitude of movement[J]. Journal of Experimental Psychology, 1954, 47(6): 381–391. doi: 10.1037/h0055392
[12]	DHAKAL V, FEIT A M, KRISTENSSON P O, et al. Observations on typing from 136 million keystrokes[C]. The 2018 CHI Conference on Human Factors in Computing Systems, Montreal, Canada, 2018: 646.
[13]	Verizon: IP latency statistics[EB/OL]. https://www.verizon.com/business/terms/latency/, 2021.
[14]	KILLOURHY K S and AXION R A. Free vs. transcribed text for keystroke-dynamics evaluations[C]. The 2012 Workshop on Learning from Authoritative Security Experiment Results, Arlington, USA, 2012: 1–8.
[15]	SCHUSTER M and PALIWAL K K. Bidirectional recurrent neural networks[J]. IEEE Transactions on Signal Processing, 1997, 45(11): 2673–2681. doi: 10.1109/78.650093
[16]	SCHWARZ M, LIPP M, GRUSS D, et al. KeyDrown: Eliminating software-based keystroke timing side-channel attacks[C]. The 25th Annual Network and Distributed System Security Symposium, San Diego, USA, 2018.
[17]	BALSA E, TRONCOSO C, and DIAZ C. OB-PWS: Obfuscation-based private web search[C]. 2012 IEEE Symposium on Security and Privacy, San Francisco, USA, 2012: 491–505.
[18]	ZHENG Li, ZHANG Liren, and XU Dong. Characteristics of network delay and delay jitter and its effect on voice over IP (VoIP)[C]. 2001 IEEE International Conference on Communications, Helsinki, Finland, 2001: 122–126.