Design and Implementation of High Speed PCIe Cipher Card Supporting GM Algorithms
-
摘要: 密码卡在信息安全领域发挥着重要作用,但当前密码卡存在性能不足的问题,难以满足高速网络安全服务的需要。该文提出一种基于MIPS64多核处理器的高速PCIe密码卡的设计与系统实现方法,支持SM2/3/4国产密码(GM)算法以及RSA, SHA, AES等国际密码算法,系统包括硬件模块,密码算法模块,主机驱动模块和接口调用模块;对SM3的实现提出一种优化方案,性能提升了19%;支持主机以Non-Blocking方式发送请求,单进程应用即可获得密码卡满载性能。该卡在10核CPU下SM2签名和验证速度分别为18000次/s和4200次/s, SM3杂凑速度2200 Mbps, SM4加/解密速度8/10 Gbps,多项指标达到较高水平;采用1300 MHz主频16核CPU时,SM2/3的性能指标提高1倍,采用48核CPU时SM2签名速度可达到105次/s。Abstract: Cipher cards play an important role in the field of information security. However, the performance of cipher cards are insufficient, and it is difficult to meet the needs of high-speed network security services. A design and system implementation method of high-speed PCIe cipher card based on MIPS64 multi-core processor is proposed, which supports the GM algorithm SM2/3/4 and international cryptographic algorithms, such as RSA, SHA and AES. The implemented system includes module of hardware, cryptographic algorithm, host driver and interface calling. An optimization scheme for the implementation of SM3 is proposed, the performance is improved by 19%. And the host to send requests in Non-Blocking mode is supported, so a single-process application can get the cipher card’s full load performance. Under 10-core CPU, the speed of SM2 signature and verification are 18000 and 4200 times/s, SM3 hash speed is 2200 Mbps, SM4 encryption/decryption speed is 8/10 Gbps, multiple indicators achieve higher level; When using 16-core CPU @1300 MHz, SM2/3 performance can be improved by more than 100%, and the speed of SM2 signature could achieve 105 times/s with 48-core CPU.
-
Key words:
- Cipher card /
- PCIe bus /
- GM algorithm /
- Non-Blocking
-
表 1 优化前后对比
输入长度(Byte) 运算速度(Mbps) 性能提升(%) 优化前 优化后 64 96 115 19.8 256 156 186 19.2 1 k 185 220 18.9 4 k 194 231 19.1 16 k 196 233 18.9 表 2 阻塞、非阻塞单进程对比
请求运算类型 运算速度(次/s) 性能提升(%) 阻塞 非阻塞 SM2 签名 1710 17523 900 SM2 验签 418 4240 900 RSA(2048) 签名 219 2200 900 RSA(2048) 验签 2018 20232 900 表 3 密码卡性能测试结果对比
密码卡种类 SM2 (次/s) SM3 (Mbps) SM4 (Gbps) RSA2048 (次/s) AES128 (Gbps) SHA1 (Gbps) SHA256 (Gbps) 签名 验证 签名 验证 SJK1572 14000 4000 1300 1.3 – – – – – SJK1120 1800 1300 1 1.2 30 350 1.2 – – SJK1337 31000 19000 1700 2.2 – – – – 0.8 本密码卡 18000 4100 2200 8.1 2200 20232 9.0 13.0 13.0 -
ABBASINEZHAD-MOOD D and NIKOOGHADAM M. An anonymous ECC-based self-certified key distribution scheme for the smart grid[J]. IEEE Transactions on Industrial Electronics, 2018, 65(10): 7996–8004. doi: 10.1109/TIE.2018.2807383 ADALIER M. Efficient and secure elliptic curve cryptography implementation of curve P-256[EB/OL]. http://csrc.nist.gov/groups/ST/ecc-workshop-2015/papers/session6-adalier-mehmet.pdf. PAN Wuqiong, ZHENG Fangyu, ZHAO Yuan, et al. An efficient elliptic curve cryptography signature server with GPU acceleration[J]. IEEE Transactions on Information Forensics and Security, 2017, 12(1): 111–122. doi: 10.1109/TIFS.2016.2603974 程明智, 周由胜, 辛阳, 等. GF(2192)域上ECC加密的FPGA实现[J]. 华中科技大学学报 (自然科学版), 2009, 37(10): 9–12. doi: 10.13245/j.hust.2009.10.023CHENG Mingzhi, ZHOU Yousheng, XIN Yang, et al. FPGA realization of ECC encryption algorithm in GF(2192)[J]. Journal of Huazhong University of Science and Technology (Natural Science Edition) , 2009, 37(10): 9–12. doi: 10.13245/j.hust.2009.10.023 ROTA L, CASELLE M, CHILINGARYAN S, et al. A PCIe DMA architecture for multi-gigabyte per second data transmission[J]. IEEE Transactions on Nuclear Science, 2015, 62(3): 972–976. doi: 10.1109/TNS.2015.2426877 PCI express base specification revision 3.0[EB/OL]. https://doc.mbalib.com/view/e99fb1d0aab4982329ffd43f1a0dbf3b.html, 2010. CAVIUM. OCTEON Ⅱ CN66XX multi-core MIPS64 Proce-ssors[J/OL]. http://www.cavium.com/OCTEONⅡCN66XX.html. 2011. 国家密码管理局. GM/T 0018–2012 密码设备应用接口规范[S]. 北京: 中国标准出版社, 2012.State Cryptography Administration Office of Security Commercial Code Administration. GM/T 0018–2012 Interface specifications of cryptography device application[S]. Beijing: China Standard Press, 2012. 国家密码管理局. GM/T 0002–2012 SM4分组密码算法[S]. 北京: 中国标准出版社, 2012.State Cryptography Administration Office of Security Commercial Code Administration. GM/T 0002–2012 SM4 block cipher algorithm[S]. Beijing: China Standard Press, 2012. 国家密码管理局. GM/T 0003–2012 SM2椭圆曲线公钥密码算法[S]. 北京: 中国标准出版社, 2012.State Cryptography Administration Office of Security Commercial Code Administration. GM/T 0003–2012 Public key cryptographic algorithm SM2 based on elliptic curves[S]. Beijing: China Standard Press, 2012. LI Yang, WANG Jinlin, ZENG Xuewen, et al. Fast Montgomery modular multiplication and squaring on embedded processors[J]. IEICE Transactions on Communications, 2017, E110.B(5): 680–690. doi: 10.1587/transcom.2016EBP3189 MONTGOMERY P L. Modular multiplication without trial division[J]. Mathematics of Computation, 1985, 44(170): 519–521. doi: 10.1090/S0025-5718-1985-0777282-X MÖLLER B. Improved techniques for fast exponentiation[C]. The 5th International Conference on Information Security and Cryptology-ICISC 2002, Seoul, Korea, 2002: 298–312. ZHANG Dan and BAI Guoqiang. High-performance implementation of SM2 based on FPGA[C]. The 8th IEEE International Conference on Communication Software and Networks, Beijing, China, 2016: 718–722. ZHOU Xin and TANG Xiaofei. Research and implementation of RSA algorithm for encryption and decryption[C]. The 6th International Forum on Strategic Technology, Harbin, China, 2011, (2): 1118–1121. 国家密码管理局. GM/T 0004–2012 SM3密码杂凑算法[S]. 北京: 中国标准出版社, 2012.State Cryptography Administration Office of Security Commercial Code Administration. GM/T 0004–2012 SM3 cryptographic hash algorithm[S]. Beijing: China Standard Press, 2012. 朱宁龙, 戴紫彬, 张立朝, 等. SM3及SHA-2系列算法硬件可重构设计与实现[J]. 微电子学, 2015, 45(6): 777–780. doi: 10.13911/j.cnki.1004-3365.2015.06.021ZHU Ninglong, DAI Zibin, ZHANG Lichao, et al. Design and implementation of hardware reconfiguration for SM3 and SHA-2 hash function[J]. Microelectronics, 2015, 45(6): 777–780. doi: 10.13911/j.cnki.1004-3365.2015.06.021 杨先伟, 康红娟. SM3杂凑算法的软件快速实现研究[J]. 智能系统学报, 2015, 10(6): 954–959. doi: 10.11992/tis.201507036YANG Xianwei and KANG Hongjuan. Fast software implementation of SM3 hash algorithm[J]. CAAI Transactions on Intelligent Systems, 2015, 10(6): 954–959. doi: 10.11992/tis.201507036 于永鹏, 严迎建, 李伟. SM3算法高速ASIC设计及实现[J]. 微电子学与计算机, 2016, 33(4): 21–26. doi: 10.19304/j.cnki.issn1000-7180.2016.04.005YU Yongpeng, YAN Yingjian, and LI Wei. High speed ASIC design and implementation of SM3 algorithm[J]. Microelectronics &Computer, 2016, 33(4): 21–26. doi: 10.19304/j.cnki.issn1000-7180.2016.04.005 JUANG W S. Efficient multi-server password authenticated key agreement using smart cards[J]. IEEE Transactions on Consumer Electronics, 2004, 50(1): 251–255. doi: 10.1109/TCE.2004.1277870 卫士通. 商用PCI-E密码卡[EB/OL]. http://www.westone.com.cn/index.php?m=content&c=index&a=show&catid=17&id=1, 2018.WESTONE. Commercial PCI-E cipher card[EB/OL]. http://www.westone.com.cn/index.php?m=content&c=index&a=show&catid=17&id=1, 2018. 渔翁信息. 如何选择商密加密卡[EB/OL]. http://www.fisec.com.cn/page118?article_id=30, 2017.FISEC. How to Choose a commercial encryption card[EB/OL]. http://www.fisec.com.cn/page118?article_id=30, 2017. 西电捷通. 高速通用密码卡之西电捷通综合性测试分析[EB/OL]. http://www.sohu.com/a/124421829_446726, 2017.IWNCOMM. Comprehensive test analysis of IWNCOMM with high-speed universal cipher card[EB/OL]. http://www.sohu.com/a/124421829_446726, 2017. 李军, 陈君, 倪宏, 等. 基于多核协作的流媒体内容缓存算法[J]. 网络新媒体技术, 2014, 3(4): 12–18. doi: 10.3969/j.issn.2095-347X.2014.04.003LI Jun, CHEN Jun, NI Hong, et al. Multi-core platform based multimedia collaboration caching algorithm[J]. Journal of Network New Media, 2014, 3(4): 12–18. doi: 10.3969/j.issn.2095-347X.2014.04.003