一种可解释的自由文本击键事件序列分类模型

张畅; 韩继红; 张玉臣; 李福林

doi:10.11999/JEIT211567

一种可解释的自由文本击键事件序列分类模型

doi: 10.11999/JEIT211567 cstr: 32379.14.JEIT211567

信息工程大学郑州 450000

详细信息

作者简介:
张畅：男，讲师，研究方向为网络安全、事件序列分类

韩继红：女，教授，研究方向为网络安全、安全协议分析

张玉臣：男，教授，研究方向为网络安全

李福林：男，副教授，研究方向为网络安全

通讯作者:
张畅　zhang_chang_xd@163.com

中图分类号: TP181
计量
- 文章访问数: 671
- HTML全文浏览量: 393
- PDF下载量: 82
- 被引次数: 0
出版历程
- 收稿日期: 2021-12-27
- 修回日期: 2022-05-22
- 录用日期: 2022-06-01
- 网络出版日期: 2022-06-07
- 刊出日期: 2023-02-07

An Interpretable Free-text Keystroke Event Sequence Classification Model

Information Engineering University, Zhengzhou 450000, China

摘要

摘要: TypeNet是一种基于两层长短时记忆网(LSTM)分支结构的孪生网络模型，在自由文本击键事件序列分类任务上取得了很好的效果，但缺乏可解释性。为此，该文改进了TypeNet模型，提出一种基于单层LSTM分支结构的孪生网络模型TypeNet II。TypeNet II模型用多层感知机度量两个分支输出表征向量差的绝对值体现的特征序列的相似度。模型训练完毕后，用多元二项式回归模拟多层感知机部分，基于得到的多元二项式对模型进行解释。实验结果表明，TypeNet II模型的分类效果超出了已有的TypeNet模型，多元二项式回归的结果具有泛化性，表征向量差的绝对值与相似度量之间存在非线性关系。
- 孪生网络 /
- 长短时记忆网 /
- 击键 /
- 多层感知机 /
- 可解释性
Abstract: TypeNet is a Siamese network model based on two-layer Long-Short Term Memory (LSTM) branch structure. It has achieved good results in the classification of free-text keystroke event sequences, but lacks interpretation. Therefore, the TypeNet model is transformed, and a Siamese network TypeNet II based on a single-layer LSTM branch structure is proposed. A multi-layer perceptron is used to measure the similarity of two feature sequences reflected by the absolute value of the difference between the output embeddings of the two branches. After the model training, the multi-layer perceptron is simulated by a multivariate binomial expression. Based on the obtained multivariate binomial expression, the classification judgment of the model can be explained. The experimental results show that the classification effect of the TypeNet II model exceeds the existing TypeNet model. The results of multivariate binomial regression are generalized, and there is a nonlinear relationship between the absolute value of the difference of the embeddings and the similarity measure.
- Siamese network /
- Long-Short Term Memory (LSTM) /
- Keystroke /
- Multi-layer perceptron /
- Interpretability

HTML全文

图 1 TypeNet II的结构

下载: 全尺寸图片幻灯片

图 2 决策层的多元多项式回归模型

下载: 全尺寸图片幻灯片

图 3 TypeNet II模型的训练和验证准确率

下载: 全尺寸图片幻灯片

图 4 分类效果随负样本类数量的变化情况

下载: 全尺寸图片幻灯片

图 5 表征向量维度为128的TypeNet II模型得到比较层的数值的小提琴图

下载: 全尺寸图片幻灯片

图 6 表征向量维度为64的TypeNet II模型得到比较层的数值的小提琴图

下载: 全尺寸图片幻灯片

图 8 被试者179773的特征序列对应的比较层和模型输出的可视化

下载: 全尺寸图片幻灯片

图 7 被试者175380的特征序列对应的比较层和模型输出的可视化

下载: 全尺寸图片幻灯片

表 1 TypeNet II模型的主要超参数

	LSTM层神经元数	LSTM隐藏层间 dropout比率	LSTM与分支的输出Dense 层间dropout比率	分支的输出Dense层的神经元数和决策层输入Dense层的神经元数	优化器	初始学习率	批大小
参数值	128	0.2	0.5	取集合{128,64,32,3,2}中的值	Nadam	0.001	512

下载: 导出CSV

表 2 TypeNet II模型不同表征向量维度对应的最佳训练验证准确率

	128	64	32	3	2
验证准确率(%)	95.79	93.91	80.08	82.43	82.24

下载: 导出CSV

表 3 模型的分类效果

	TypeNet: contrastive loss	TypeNet: triplest loss	TypeNet: SM-CL, G=6	TypeNet: SM-TL, G=6	TypeNet II: 表征向量为128维	TypeNet II: 表征向量为64维
等错误率(%)	5.4^[13]	2.2^[13]	2.42^[13]	1.85^[13]	1.76	2.12

下载: 导出CSV

表 4 两个表征向量维度下$ {P_{{\text{tr}}}} $上多元多项式$ f $不同自由度对应的$ {R^{\text{2}}} $

	128			64
自由度	1	2	3	1	2	3
训练$ {R^2} $	0.820092	0.915693	–0.547854	0.820032	0.919382	–0.566558

下载: 导出CSV

表 5 两个表征向量维度下$ {P_{{\text{tr}}}} $上多元2阶项式岭回归结果

	128	64
$ \lambda $	2.00	2.00
测试$ {R^2} $	0.94	0.92
MSE	0.01	0.02

下载: 导出CSV

表 6 ${{{P}}_{{\text{tr}}}}$上表征向量为128维，多元2阶多项式系数绝对值超过0.5的项

	v_₉₅×v_₉₆	v_₁₁₉	v_₈	v_₇₀	v_₆₃	v_₅₅	v_₁₁₅	v_₁₂	v_₈₉	v_₈₅	v_₂₉	v_₇₇	v_₃₇	v_₆₅	v_₈₄	v_₄₃	v_₁₁₁	v_₂₀
系数	0.50	0.56	0.56	0.64	0.65	0.66	0.69	0.74	0.75	0.77	0.85	0.87	0.92	0.92	1.26	1.28	1.29	1.47

下载: 导出CSV

表 7 ${{{P}}_{{\text{tr}}}}$上表征向量为64维，多元2阶多项式系数绝对值超过0.4的项

	v_₂₁	v_₄₅	v_₅₆	v_₃₁×v_₃₂	v_₁₃	v_₂₈	v_₂₇	v_₃₃
系数	–0.54	–0.41	0.40	0.41	0.42	0.47	0.49	0.78

下载: 导出CSV

表 8 两个表征向量维度下${{{P}}_{{\text{te}}}}$上多项式岭回归结果

	128	64
$ \lambda $	2.00	2.00
测试R²	0.95	0.92
MSE	0.01	0.02

下载: 导出CSV

表 9 ${{{P}}_{{\text{te}}}}$上表征向量为128维，多元2阶多项式系数绝对值超过0.5的项

	v_₇₀	v_₆₃	v_₁₁₉	v_₈₉	v_₅₅	v_₁₁₅	v_₂₉	v_₁₂	v_₇₇	v_₈₅	v_₆₅	v_₃₇	v_₄₃	v_₈₄	v_₁₁₁	v_₂₀
系数	0.54	0.56	0.60	0.64	0.64	0.72	0.76	0.84	0.84	0.85	0.89	0.94	1.14	1.15	1.17	1.31

下载: 导出CSV

表 10 ${{{P}}_{{\text{te}}}}$上表征向量为64维，多元2阶多项式系数绝对值超过0.4的项

	v_₂₁	v_₃₁	v₁₈×v₁₉	v_₁₃	v_₂₈	v_₂₇	v_₃₃
系数	–0.44	0.40	0.42	0.42	0.46	0.52	0.78

下载: 导出CSV

参考文献(20)

[1]	GAINES R S, LISOWSKI W, PRESS S J, et al. Authentication by keystroke timing: Some preliminary results[R]. R-2526-NSF, 1980.
[2]	ACIEN A, MORALES A, MONACO J V, et al. TypeNet: Deep learning keystroke biometrics[J]. IEEE Transactions on Biometrics, Behavior, and Identity Science, 2022, 4(1): 57–70. doi: 10.1109/TBIOM.2021.3112540
[3]	MONROSE F and RUBIN A D. Keystroke dynamics as a biometric for authentication[J]. Future Generation Computer Systems, 2000, 16(4): 351–359. doi: 10.1016/S0167-739X(99)00059-X
[4]	CURTIN M, TAPPERT C, VILLANI, et al. Keystroke biometric recognition on long-text input: A feasibility study[C]. Student/Faculty Research Day, CSIS, Pace University, New York City, USA, 2006.
[5]	AYOTTE B, HUANG Jiaju, BANAVAR M K, et al. Fast continuous user authentication using distance metric fusion of free-text keystroke data[C]. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Long Beach, USA, 2019.
[6]	AYOTTE B, BANAVAR M, HOU Daqing, et al. Fast free-text authentication via instance-based keystroke dynamics[J]. IEEE Transactions on Biometrics, Behavior, and Identity Science, 2020, 2(4): 377–387. doi: 10.1109/TBIOM.2020.3003988
[7]	BERGADANO F, GUNETTI D, and PICARDI C. User authentication through keystroke dynamics[J]. ACM Transactions on Information and System Security, 2002, 5(4): 367–397. doi: 10.1145/581271.581272
[8]	GUNETTI D and PICARDI C. Keystroke analysis of free text[J]. ACM Transactions on Information and System Security, 2005, 8(3): 312–347. doi: 10.1145/1085126.1085129
[9]	KANG P and CHO S. Keystroke dynamics-based user authentication using long and free text strings from various input devices[J]. Information Sciences, 2015, 308: 72–93. doi: 10.1016/j.ins.2014.08.070
[10]	SINGH S and ARYA K V. Key classification: A new approach in free text keystroke authentication system[C]. 2011 Third Pacific-Asia Conference on Circuits, Communications and System (PACCS), Wuhan, China, 2011: 1–5.
[11]	TAPPERT C C, VILLANI M, and CHA S H. Keystroke biometric identification and authentication on long-text input[M]. WANG Ling and GENG Xin. Behavioral Biometrics for Human Identification: Intelligent Applications. Hershey: Medical Information Science Reference, 2010: 342–367.
[12]	MONACO J V and TAPPERT C C. The partially observable hidden Markov model and its application to keystroke dynamics[J]. Pattern Recognit, 2018, 76: 449–462. doi: 10.1016/j.patcog.2017.11.021
[13]	芦效峰, 张胜飞, 伊胜伟. 基于CNN和RNN的自由文本击键模式持续身份认证[J]. 清华大学学报:自然科学版, 2018, 58(12): 1072–1078. doi: 10.16511/j.cnki.qhdxxb.2018.26.048 LU Xiaofeng, ZHANG Shengfei, and YI Shengwei. Free-text keystroke continuous authentication using CNN and RNN[J]. Journal of Tsinghua University:Science and Technology, 2018, 58(12): 1072–1078. doi: 10.16511/j.cnki.qhdxxb.2018.26.048
[14]	DHAKAL V, FEIT A M, KRISTENSSON P O, et al. Observations on typing from 136 million keystrokes[C]. The 2018 CHI Conference on Human Factors in Computing Systems, Montreal, Canada, 2018.
[15]	MORALES A, FIERREZ J, ACIEN A, et al. SetMargin loss applied to deep keystroke biometrics with circle packing interpretation[J]. Pattern Recognition, 2021, 122: 108283. doi: 10.1016/j.patcog.2021.108283
[16]	AYOTTE B, BANAVAR M K, HOU Daqing, et al. Group leakage overestimates performance: A case study in keystroke dynamics[C]. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Nashville, USA, 2021.
[17]	美国国防部国防创新试验小组和美国陆军网络事业技术司令部定制开发BioTracker[EB/OL]. https://www.biometricupdate.com/201708/u-s-army-to-deploy-plurilock-behavioral-iometric-id-authentication-solution, 2017.
[18]	PATEL Y, OUAZZANE K, VASSILEV V T, et al. Keystroke dynamics using auto encoders[C]. 2019 International Conference on Cyber Security and Protection of Digital Services (Cyber Security), Oxford, UK, 2019.
[19]	LI Zengpeng, WANG Ding, and MORAIS E. Quantum-safe round-optimal password authentication for mobile devices[J]. IEEE Transactions on Dependable and Secure Computing, 2022, 19(3): 1885–1899. doi: 10.1109/TDSC.2020.3040776
[20]	汪定, 王平, 雷鸣. 基于RSA的网关口令认证密钥交换协议的分析与改进[J]. 电子学报, 2015, 43(1): 176–184. doi: 10.3969/j.issn.0372-2112.2015.01.028 WANG Ding, WANG Ping, and LEI Ming. Cryptanalysis and improvement of gateway-oriented password authenticated key exchange protocol based on RSA[J]. Acta Electronica Sinica, 2015, 43(1): 176–184. doi: 10.3969/j.issn.0372-2112.2015.01.028