CSNN: Password Set Security Evaluation Method Based on Chinese Syllables and Neural Network
-
摘要: 口令猜测攻击是一种最直接的获取信息系统访问权限的攻击,采用恰当方法生成的口令字典能够准确地评估信息系统口令集的安全性。该文提出一种针对中文口令集的口令字典生成方法(CSNN)。该方法将每个完整的汉语拼音视为一个整体元素,后利用汉语拼音的规则对口令进行结构划分与处理。将处理后的口令放入长短期记忆网络(LSTM)中训练,用训练后的模型生成口令字典。该文通过命中率实验评估CSNN方法的效能,将CSNN与其它两种经典口令生成方法(即,概率上下文无关文法PCFG和5阶马尔可夫链模型)对生成口令的命中率进行实验对比。实验选取了不同规模的字典,结果显示,CSNN方法生成的口令字典的综合表现优于另外两种方案。与概率上下文无关文法相比,在猜测数为107时,CSNN字典在不同测试集上的命中率提高了5.1%~7.4%(平均为6.3%);相对于5阶马尔可夫链模型,在猜测数为8×105时,CSNN字典在不同测试集上的命中率提高了2.8%~12%(平均为8.2%)。Abstract: Password guessing attack is the most direct way to break information systems. Using appropriate methods to generate password dictionaries can accurately evaluate the security of password sets. This paper proposes a new approach to the Chinese password set security evaluation that is named Chinese Syllables and Neural Network-based password generation (CSNN). In CSNN, each chinese syllable is treated as an integral element, and the spelling rules of chinese syllable can be used to parse and process the passwords. The processed passwords are then trained in the neural network model of Long Short-Term Memory (LSTM), which is used to generate password dictionaries (guessing sets). To evaluate the performance of CSNN, the hit rates of guessing sets generated by CSNN is compared with the two classical approaches (i.e., Probability Context-Free Grammar (PCFG) and 5th-order Markov chain model). In the hit rate experiment, guessing sets of different scales are selected; the results show that the comprehensive performance of guessing sets generated by CSNN is better than PCFG and 5th-order markov chain model. Compared with PCFG, different scales of CSNN guessing sets can improve 5.1%~7.4% in hit rate on some test sets by 107 guesses (average 6.3%); Compared with 5th-order markov chain model, the CSNN guessing sets increased its hit rate by 2.8% to 12% (with an average of 8.2%) by 8×105 guesses.
-
表 1 Structure Parsing算法
input: Training Set, allCSs intermediate result: the structure of current password (thisStructure) output: Password structure frequency table(Structure) 1 for password $ \in $ Training Set do 2 if Array_alphaStrings ← match_alplaStrings(password) then 3 for alplaString $ \in $ Array_alphaString do 4 i, e ← index(alplaString), end(alplaString) 5 if CSs ← match_CSs(alplaString) then 6 Array_Ci, Array_Ce ← index(CSs), end(CSs) 7 Queue_append(thisStructure,'C', Array_Ci) 8 Array_Li ← getsubStringIndex(i,e,Array_Ci, Array_Ce) 9 Queue_append(thisStructure,'L', Array_Li) 10 end if 11 else 12 Queue_append(thisStructure,'L', i) 13 end else 14 end for 15 end if 16 if Array_digitStrings ← match_digitStrings(password) then 17 Array_Di ← index(Array_digitStrings) 18 Queue_append(thisStructure,'D', Array_Di) 19 end if 20 if Array_specialStrings← match_specialStrings(password) then 21 Array_Si ← index(Array_specialStrings) 22 Queue_append(thisStructure,'S', Array_Si) 23 end if 24 Structure.add(thisStructure) 25 end for 26 Structure.frequency() 27 return Structure 表 2 Password Generation算法
input: $\Sigma $, M output: Password dictionary 1 count ← 0 2 while count < scale do 3 nowStr ← getStr_rand($\Sigma $) 4 nowStr ← strCat(nowStr, EOF) 5 incoPwd ← STA 6 for seg $ \in $ nowStr do 7 if seg $ \in $ predict(M, incoPwd) then 8 prediction ← selectSeg_rand(M, seg) 9 tempPwd ← pwdCat(incoPwd, prediction) 10 if len(printable(tempPwd)) <= Len and weight(printable(tempPwd)) >= T then 11 incoPwd ← tempPwd 12 else 13 incoPwd ← NULL 14 break 15 end if 16 else 17 incoPwd ← NULL 18 break 19 end if 20 end for 21 if end(incoPwd) == EOF then 22 dictionary.add(printable(incoPwd)) 23 ++count 24 end if 25 end while 26 return dictionary 表 3 本文使用的口令集信息
口令集 服务类型 原始数量 使用数量 口令总量(占使用口令百分比) 包含字母字符串 包含拼音 有2个及以上拼音相连 仅由拼音构成 嘟嘟牛 电子商务 16,258,260 12,494,033 8,856,456(70.9%) 3,606,968(28.9%) 1,079,000(8.6%) 1,752,575(14.0%) CSDN IT论坛 6,428,277 6,370,893 3,619,077(56.8%) 2,046,963(32.1%) 583,968(9.2%) 550,444(8.6%) 12306 铁路票务 129,303 129,303 95,373(73.8%) 39,544(30.6%) 10,861(8.4%) 17,146(13.2%) 网易邮箱 邮箱 1,220,088,121 20,630,312 11,532,344(55.9%) 5279116(25.6%) 18,30,575(8.9%) 2,018,686(10.6%) 表 4 各口令集中最流行的18个汉语拼音
口令集 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 网易邮箱 wo li ai wang yu ni ng xiao zhang wei liu ji yang xi chen wu hu ma 嘟嘟牛 wo li ai ni yu wang liu xiao zhang wei ng ji xu chen yang hu wu xi 12306 wo li ai ni wang yu wei xiao liu ji zhang ma ng chen shi an yang wu CSDN li wo de yu wang ng ji liu zhang xiao ai wei ma xi an ni chen hu 表 5 口令结构分布频率(%)
排名 网易邮箱 嘟嘟牛 12306 CSDN 结构 频率 结构 频率 结构 频率 结构 频率 1 D 43.5 LD 31.8 LD 30.1 D 42.7 2 LD 22.7 D 29.0 D 27.2 LD 14.8 3 CD 6.4 CD 11.2 CD 10.4 CD 5.6 4 LCD 4.9 DL 7.6 DL 9.3 LCD 5.3 5 DL 4.4 LCD 6.4 LCD 6.9 LC 4.5 6 LC 3.9 LC 2.3 CLD 2.1 DL 4.3 7 C 1.5 CLD 1.4 LC 2.1 LCL 2.7 8 DC 1.1 DC 1.2 LCLD 1.7 L 1.8 9 LCL 0.9 LCLD 1.1 DC 1.2 CLD 1.7 10 CLD 0.9 C 1.0 LDL 1.1 LCLD 1.7 -
王勇, 吴金君, 田增山, 等. 基于FMCW雷达的多维参数手势识别算法[J]. 电子与信息学报, 2019, 41(4): 822–829. doi: 10.11999/JEIT180485WANG Yong, WU Jinjun, TIAN Zengshan, et al. Gesture recognition with multi-dimensional parameter using FMCW radar[J]. Journal of Electronics &Information Technology, 2019, 41(4): 822–829. doi: 10.11999/JEIT180485 马杰, 张绣丹, 杨楠, 等. 融合密集卷积与空间转换网络的手势识别方法[J]. 电子与信息学报, 2018, 40(4): 951–956. doi: 10.11999/JEIT170627MA Jie, ZHANG Xiudan, YANG Nan, et al. Gesture recognition method combining dense convolutional with spatial transformer networks[J]. Journal of Electronics &Information Technology, 2018, 40(4): 951–956. doi: 10.11999/JEIT170627 王平, 汪定, 黄欣沂. 口令安全研究进展[J]. 计算机研究与发展, 2016, 53(10): 2173–2188. doi: 10.7544/issn1000-1239.2016.20160483WANG Ping, WANG Ding, and HUANG Xinyi. Advances in password security[J]. Journal of Computer Research and Development, 2016, 53(10): 2173–2188. doi: 10.7544/issn1000-1239.2016.20160483 MORRIS R and THOMPSON K. Password security: A case history[J]. Communications of the ACM, 1979, 22(11): 594–597. doi: 10.1145/359168.359172 WU T. A real-world analysis of Kerberos password security[C]. 1999 Network and Distributed System Security Symposium, San Diego, USA, 1999: 13–22. KLEIN D V. Foiling the cracker: A survey of, and improvements to, password security[J]. Programming and Computer Software, 1992, 17(3): 5–14. HOCHREITER S and SCHMIDHUBER J. Long short-term memory[J]. Neural Computation, 1997, 9(8): 1735–1780. doi: 10.1162/neco.1997.9.8.1735 LEVY O, LEE K, FITZGERALD N, et al. Long Short-term memory as a dynamically computed element-wise weighted sum[J]. 2018, arXiv: 1805.03716. MELICHER W, UR B, SEGRETI S M, et al. Fast, lean, and accurate: Modeling password guessability using neural networks[C]. The 25th USENIX Security Symposium, Austin, USA, 2016: 175–191. WEIR M, AGGARWAL S, DE MEDEIROS B, et al. Password cracking using probabilistic context-free grammars[C]. The 30th IEEE Symposium on Security and Privacy, Berkeley, USA, 2009: 391–405. doi: 10.1109/SP.2009.8. NARAYANAN A and SHMATIKOV V. Fast dictionary attacks on passwords using time-space tradeoff[C]. The 12th ACM Conference on Computer and Communications Security, New York, USA, 2005: 364–372. doi: 10.1145/1102120.1102168. MA J, YANG Weining, LUO Min, et al. A study of probabilistic password models[C]. 2014 IEEE Symposium on Security and Privacy, San Jose, USA, 2014: 689–704. doi: 10.1109/SP.2014.50. WANG Ding, ZHANG Zijian, WANG Ping, et al. Targeted online password guessing: An underestimated threat[C]. 2016 ACM SIGSAC Conference on Computer and Communications Security, Vienna, The Republic of Austria, 2016: 1242–1254. doi: 10.1145/2976749.2978339. HITAJ B, GASTI P, ATENIESE G, et al. PassGAN: A deep learning approach for password guessing[C]. The 17th International Conference on Applied Cryptography and Network Security, Bogota, Colombia, 2019: 217–237. doi: 10.1007/978-3-030-21568-2_11. PASQUINI D, GANGWAL A, ATENIESE G, et al. Improving password guessing via representation learning[J]. 2019, arXiv: 1910.04232. LIU Yunyu, XIA Zhiyang, YI Ping, et al. GENPass: A general deep learning model for password guessing with PCFG rules and adversarial generation[C]. 2018 IEEE International Conference on Communications, Kansas City, USA, 2018: 1–6. doi: 10.1109/ICC.2018.8422243. XIA Zhiyang, YI Ping, LIU Yunyu, et al. GENPass: A multi-source deep learning model for password guessing[J]. IEEE Transactions on Multimedia, 2020, 22(5): 1323–1332. doi: 10.1109/tmm.2019.2940877 WANG Ding, WANG Ping, HE Debiao, et al. Birthday, name and bifacial-security: Understanding passwords of Chinese web users[C]. The 28th USENIX Security Symposium, Santa Clara, USA, 2019: 1537–1555. 罗敏, 张阳. 一种基于姓名首字母简写结构的口令破解方法[J]. 计算机工程, 2017, 43(1): 188–195, 200. doi: 10.3969/j.issn.1000-3428.2017.01.033LUO Min and ZHANG Yang. A password cracking method based on name initials shorthand structure[J]. Computer Engineering, 2017, 43(1): 188–195, 200. doi: 10.3969/j.issn.1000-3428.2017.01.033 LI Yue, WANG Haining, and SUN Kun. Personal information in passwords and its security implications[J]. IEEE Transactions on Information Forensics and Security, 2017, 12(10): 2320–2333. doi: 10.1109/TIFS.2017.2705627 汪定. 口令安全关键问题研究[D]. [博士论文], 北京大学, 2017.