An Interpretable Free-text Keystroke Event Sequence Classification Model
-
摘要: TypeNet是一种基于两层长短时记忆网(LSTM)分支结构的孪生网络模型,在自由文本击键事件序列分类任务上取得了很好的效果,但缺乏可解释性。为此,该文改进了TypeNet模型,提出一种基于单层LSTM分支结构的孪生网络模型TypeNet II。TypeNet II模型用多层感知机度量两个分支输出表征向量差的绝对值体现的特征序列的相似度。模型训练完毕后,用多元二项式回归模拟多层感知机部分,基于得到的多元二项式对模型进行解释。实验结果表明,TypeNet II模型的分类效果超出了已有的TypeNet模型,多元二项式回归的结果具有泛化性,表征向量差的绝对值与相似度量之间存在非线性关系。Abstract: TypeNet is a Siamese network model based on two-layer Long-Short Term Memory (LSTM) branch structure. It has achieved good results in the classification of free-text keystroke event sequences, but lacks interpretation. Therefore, the TypeNet model is transformed, and a Siamese network TypeNet II based on a single-layer LSTM branch structure is proposed. A multi-layer perceptron is used to measure the similarity of two feature sequences reflected by the absolute value of the difference between the output embeddings of the two branches. After the model training, the multi-layer perceptron is simulated by a multivariate binomial expression. Based on the obtained multivariate binomial expression, the classification judgment of the model can be explained. The experimental results show that the classification effect of the TypeNet II model exceeds the existing TypeNet model. The results of multivariate binomial regression are generalized, and there is a nonlinear relationship between the absolute value of the difference of the embeddings and the similarity measure.
-
Key words:
- Siamese network /
- Long-Short Term Memory (LSTM) /
- Keystroke /
- Multi-layer perceptron /
- Interpretability
-
表 1 TypeNet II模型的主要超参数
LSTM层
神经元数LSTM隐藏层间
dropout比率LSTM与分支的输出Dense
层间dropout比率分支的输出Dense层的神经元数和
决策层输入Dense层的神经元数优化器 初始
学习率批大小 参数值 128 0.2 0.5 取集合{128,64,32,3,2}中的值 Nadam 0.001 512 表 2 TypeNet II模型不同表征向量维度对应的最佳训练验证准确率
128 64 32 3 2 验证准确率(%) 95.79 93.91 80.08 82.43 82.24 表 3 模型的分类效果
表 4 两个表征向量维度下
$ {P_{{\text{tr}}}} $ 上多元多项式$ f $ 不同自由度对应的$ {R^{\text{2}}} $ 128 64 自由度 1 2 3 1 2 3 训练$ {R^2} $ 0.820092 0.915693 –0.547854 0.820032 0.919382 –0.566558 表 5 两个表征向量维度下
$ {P_{{\text{tr}}}} $ 上多元2阶项式岭回归结果128 64 $ \lambda $ 2.00 2.00 测试$ {R^2} $ 0.94 0.92 MSE 0.01 0.02 表 6
${{{P}}_{{\text{tr}}}}$ 上表征向量为128维,多元2阶多项式系数绝对值超过0.5的项v_95×v_96 v_119 v_8 v_70 v_63 v_55 v_115 v_12 v_89 v_85 v_29 v_77 v_37 v_65 v_84 v_43 v_111 v_20 系数 0.50 0.56 0.56 0.64 0.65 0.66 0.69 0.74 0.75 0.77 0.85 0.87 0.92 0.92 1.26 1.28 1.29 1.47 表 7
${{{P}}_{{\text{tr}}}}$ 上表征向量为64维,多元2阶多项式系数绝对值超过0.4的项v_21 v_45 v_56 v_31×v_32 v_13 v_28 v_27 v_33 系数 –0.54 –0.41 0.40 0.41 0.42 0.47 0.49 0.78 表 8 两个表征向量维度下
${{{P}}_{{\text{te}}}}$ 上多项式岭回归结果128 64 $ \lambda $ 2.00 2.00 测试R2 0.95 0.92 MSE 0.01 0.02 表 9
${{{P}}_{{\text{te}}}}$ 上表征向量为128维,多元2阶多项式系数绝对值超过0.5的项v_70 v_63 v_119 v_89 v_55 v_115 v_29 v_12 v_77 v_85 v_65 v_37 v_43 v_84 v_111 v_20 系数 0.54 0.56 0.60 0.64 0.64 0.72 0.76 0.84 0.84 0.85 0.89 0.94 1.14 1.15 1.17 1.31 表 10
${{{P}}_{{\text{te}}}}$ 上表征向量为64维,多元2阶多项式系数绝对值超过0.4的项v_21 v_31 v18×v19 v_13 v_28 v_27 v_33 系数 –0.44 0.40 0.42 0.42 0.46 0.52 0.78 -
[1] GAINES R S, LISOWSKI W, PRESS S J, et al. Authentication by keystroke timing: Some preliminary results[R]. R-2526-NSF, 1980. [2] ACIEN A, MORALES A, MONACO J V, et al. TypeNet: Deep learning keystroke biometrics[J]. IEEE Transactions on Biometrics, Behavior, and Identity Science, 2022, 4(1): 57–70. doi: 10.1109/TBIOM.2021.3112540 [3] MONROSE F and RUBIN A D. Keystroke dynamics as a biometric for authentication[J]. Future Generation Computer Systems, 2000, 16(4): 351–359. doi: 10.1016/S0167-739X(99)00059-X [4] CURTIN M, TAPPERT C, VILLANI, et al. Keystroke biometric recognition on long-text input: A feasibility study[C]. Student/Faculty Research Day, CSIS, Pace University, New York City, USA, 2006. [5] AYOTTE B, HUANG Jiaju, BANAVAR M K, et al. Fast continuous user authentication using distance metric fusion of free-text keystroke data[C]. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Long Beach, USA, 2019. [6] AYOTTE B, BANAVAR M, HOU Daqing, et al. Fast free-text authentication via instance-based keystroke dynamics[J]. IEEE Transactions on Biometrics, Behavior, and Identity Science, 2020, 2(4): 377–387. doi: 10.1109/TBIOM.2020.3003988 [7] BERGADANO F, GUNETTI D, and PICARDI C. User authentication through keystroke dynamics[J]. ACM Transactions on Information and System Security, 2002, 5(4): 367–397. doi: 10.1145/581271.581272 [8] GUNETTI D and PICARDI C. Keystroke analysis of free text[J]. ACM Transactions on Information and System Security, 2005, 8(3): 312–347. doi: 10.1145/1085126.1085129 [9] KANG P and CHO S. Keystroke dynamics-based user authentication using long and free text strings from various input devices[J]. Information Sciences, 2015, 308: 72–93. doi: 10.1016/j.ins.2014.08.070 [10] SINGH S and ARYA K V. Key classification: A new approach in free text keystroke authentication system[C]. 2011 Third Pacific-Asia Conference on Circuits, Communications and System (PACCS), Wuhan, China, 2011: 1–5. [11] TAPPERT C C, VILLANI M, and CHA S H. Keystroke biometric identification and authentication on long-text input[M]. WANG Ling and GENG Xin. Behavioral Biometrics for Human Identification: Intelligent Applications. Hershey: Medical Information Science Reference, 2010: 342–367. [12] MONACO J V and TAPPERT C C. The partially observable hidden Markov model and its application to keystroke dynamics[J]. Pattern Recognit, 2018, 76: 449–462. doi: 10.1016/j.patcog.2017.11.021 [13] 芦效峰, 张胜飞, 伊胜伟. 基于CNN和RNN的自由文本击键模式持续身份认证[J]. 清华大学学报:自然科学版, 2018, 58(12): 1072–1078. doi: 10.16511/j.cnki.qhdxxb.2018.26.048LU Xiaofeng, ZHANG Shengfei, and YI Shengwei. Free-text keystroke continuous authentication using CNN and RNN[J]. Journal of Tsinghua University:Science and Technology, 2018, 58(12): 1072–1078. doi: 10.16511/j.cnki.qhdxxb.2018.26.048 [14] DHAKAL V, FEIT A M, KRISTENSSON P O, et al. Observations on typing from 136 million keystrokes[C]. The 2018 CHI Conference on Human Factors in Computing Systems, Montreal, Canada, 2018. [15] MORALES A, FIERREZ J, ACIEN A, et al. SetMargin loss applied to deep keystroke biometrics with circle packing interpretation[J]. Pattern Recognition, 2021, 122: 108283. doi: 10.1016/j.patcog.2021.108283 [16] AYOTTE B, BANAVAR M K, HOU Daqing, et al. Group leakage overestimates performance: A case study in keystroke dynamics[C]. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Nashville, USA, 2021. [17] 美国国防部国防创新试验小组和美国陆军网络事业技术司令部定制开发BioTracker[EB/OL]. https://www.biometricupdate.com/201708/u-s-army-to-deploy-plurilock-behavioral-iometric-id-authentication-solution, 2017. [18] PATEL Y, OUAZZANE K, VASSILEV V T, et al. Keystroke dynamics using auto encoders[C]. 2019 International Conference on Cyber Security and Protection of Digital Services (Cyber Security), Oxford, UK, 2019. [19] LI Zengpeng, WANG Ding, and MORAIS E. Quantum-safe round-optimal password authentication for mobile devices[J]. IEEE Transactions on Dependable and Secure Computing, 2022, 19(3): 1885–1899. doi: 10.1109/TDSC.2020.3040776 [20] 汪定, 王平, 雷鸣. 基于RSA的网关口令认证密钥交换协议的分析与改进[J]. 电子学报, 2015, 43(1): 176–184. doi: 10.3969/j.issn.0372-2112.2015.01.028WANG Ding, WANG Ping, and LEI Ming. Cryptanalysis and improvement of gateway-oriented password authenticated key exchange protocol based on RSA[J]. Acta Electronica Sinica, 2015, 43(1): 176–184. doi: 10.3969/j.issn.0372-2112.2015.01.028