RISCV密码专用处理器能效概率模型与体系结构研究

李伟; 别梦妮; 陈韬; 吴艾青; 南龙梅

doi:10.11999/JEIT210004

RISCV密码专用处理器能效概率模型与体系结构研究

doi: 10.11999/JEIT210004 cstr: 32379.14.JEIT210004

1.
解放军信息工程大学郑州 450000
2.
复旦大学专用集成电路与系统国家重点实验室上海 200433

详细信息

作者简介:
李伟：男，1983年生，副教授，博士生导师，研究方向为密码处理器设计、ASIC专用芯片设计

别梦妮：女，1997年生，硕士生，研究方向为智能化可重构芯片电路与架构

陈韬：男，1979年生，副教授，硕士生导师，研究方向为安全专用芯片设计

吴艾青：男，1997年生，硕士生，研究方向为智能化可重构芯片电路与架构

南龙梅：女，1981年生，博士生，研究方向为大规模集成电路设计、专用集成电路设计

通讯作者:
李伟　liwei12@fudan.edu.cn

中图分类号: TN918.4; TP316.4
计量
- 文章访问数: 1107
- HTML全文浏览量: 470
- PDF下载量: 93
- 被引次数: 0
出版历程
- 收稿日期: 2021-01-04
- 修回日期: 2021-04-07
- 网络出版日期: 2021-04-16
- 刊出日期: 2021-06-18

Research on Energy Efficiency Probability Model and Architecture of RISCV Cryptographic Processor

1.
PLA Information Engineering University, Zhengzhou 450000, China
2.
State Key Laboratory of ASIC and System, Fudan University, Shanghai 200433, China

摘要

摘要: 该文以高能效为目标，建立了密码专用处理器能效概率模型，并指导高能效密码专用处理器体系结构设计。该文将面向密码领域的专用指令处理器设计空间探索问题描述为“1”值在配置矩阵中的定位问题，通过引入概率矩阵进一步将定位问题转化为最优配置的概率问题，并基于机器学习思想提出了密码专用处理器最高能效概率模型。实验证明，该文提出的能效概率模型平均经过2300次迭代输出最终结果，且预测准确率达到92.7%。根据最高能效概率模型，对密码专用处理器设计空间进行探索，获取满足高能效需求的密码专用处理器运算单元集合，以扩展指令的方式将其集成到开源通用64位RISCV处理器核心Araine中，提出高能效密码专用处理器体系结构。将该处理器在CMOS 55 nm工艺下进行逻辑综合，结果表明，该文提出的RISCV密码专用处理器与扩展前相比面积增大了426874 μm²，关键延迟增加了0.51 ns，完成密码算法总时间面积积增幅之和为0.46，执行常见密码算法能效比在1.61～35.16 Mbps/mW范围内。
- 密码处理器 /
- 机器学习 /
- 能效概率模型 /
- 高能效
Abstract: This paper establishes an energy efficiency probability model for a dedicated cryptographic processor, and guides the design of the cryptographic processor. The design space exploration problem of a processor is designed as the positioning problem of "1" values in the configuration matrix. The probability matrix is introduced to transform the positioning problem into an optimal configuration probability problem. Based on the idea of machine learning, a probability model for the highest energy efficiency of a dedicated cryptographic processor is proposed. Experiments prove that the energy efficiency probability model in this paper outputs the final result after 2300 iterations on average, and the prediction accuracy rate reaches 92.7%. According to the highest energy efficiency probability model, a collection of computing units that meet high energy efficiency requirements can be obtained, and they are integrated into the open source general-purpose 64 bit RISCV processor core named Ariane. A dedicated processor for energy-efficient cryptography is built. The processor is synthesized under the CMOS 55 nm process, and the results show that compared with Ariane, the area of the proposed cryptographic processor increases by 426874 μm², the key delay increases by 0.51 ns, and the sum of the increasing total time area of the cryptographic algorithm is 0.46, the energy efficiency ratio of common cryptographic algorithms is within the range of 1.6～35.16 Mbps/mW.
- Cryptographic processor /
- Machine learning /
- Energy efficiency probability model /
- High energy efficiency

HTML全文

图 1 概率模型学习框架

下载: 全尺寸图片幻灯片

图 2 概率矩阵转移算法流程图

下载: 全尺寸图片幻灯片

图 3 各密码算法所需原RISCV指令条数

下载: 全尺寸图片幻灯片

图 4 探索获取密码运算单元集合参数

下载: 全尺寸图片幻灯片

图 5 RISCV密码处理器核心结构示意图

下载: 全尺寸图片幻灯片

图 6 RISCV密码处理器核心实现算法的时间面积积

下载: 全尺寸图片幻灯片

图 7 RISCV密码处理器能效分析

下载: 全尺寸图片幻灯片

表 1 参数列表

参数	约束	含义
${x_i}$	$1 \le i \le n$	可集成到处理器中的密码加速单元
${y_i}$	$1 \le l \le m$	算法集合中某目标密码算法
${t_i}$	$1 \le i \le n$	密码加速单元${x_i}$对应的关键延迟
${s_i}$	$1 \le i \le n$	密码加速单元${x_i}$对应的面积
${k_i}$	${k_i} \in \left[ {0,1} \right]$	表示单元${x_i}$是否为处理器扩展单元(1表示扩展)
${t_0}$		表示未扩展密码运算单元时处理器关键延迟
${s_0}$		表示为扩展密码运算单元时处理器面积
${C_l}$		一种密码运算单元扩展配置下完成${y_l}$算法所需时钟周期数
${C_{l0}}$		未扩展密码运算单元时完成${y_l}$算法所需时钟周期数

下载: 导出CSV

表 2 指令模板

	[31:26]	[25]	[24:20]	[1915]	[14:12]	[11:7]	[6:0]
RL	Funt7		Rsd2	Rsd1	Funt3	Funt5	opcode
SRI	Imm[5:0]	Funt1	Rsd2	Rsd1	Funt3	Rds	opcode
C	Funt7		Rs2	Rs1	Imm[7:0]		opcode

下载: 导出CSV

表 3 RISCV密码专用处理器性能

	Ariane	密码专用处理器
关键延迟(ns)	1.9	2.3
面积(μm²)	1321154	1748028

下载: 导出CSV

表 4 RISCV密码专用处理器实现算法所需指令周期数

	Ariane	密码专用处理器
AES	29450	402
DES	13472	384
SHA256	11725	1192
SNOWV	10508	1026
MD5	4210	196

下载: 导出CSV

表 5 密码处理器能效对比分析(Mbps/mW)

	文献[17]	文献[18]	密码专用处理器
AES	9.39	20.74	5.30
DES	2.91	6.91	3.03
SHA256	3.55	0.26	5.56
SNOWV	–	–	1.61
MD5	3.30	0.64	35.16

下载: 导出CSV

参考文献(18)

[1]	AZIZI O, MAHESRI A, LEE B C, et al. Energy-performance tradeoffs in processor architecture and circuit design: A marginal cost analysis[J]. ACM SIGARCH Computer Architecture News, 2010, 38(3): 26–36. doi: 10.1145/1816038.1815967
[2]	DUBACH C, JONES T M, and O’BOYLE M F P. Exploring and predicting the architecture/optimising compiler Co-design Space[C]. The 2008 International Conference on Compilers, Architecture, and Synthesis for Embedded Systems, Atlanta, USA, 2008: 31–40.
[3]	LEE B C and BROOKS D M. Accurate and efficient regression modeling for microarchitectural performance and power prediction[C]. The 12th International Conference on Architectural Support for Programming Languages and Operating Systems, San Jose, USA, 2006: 185–194.
[4]	LEE B C and BROOKS D M. Illustrative design space studies with microarchitectural regression models[C]. The 13th IEEE International Symposium on High Performance Computer Architecture, Scottsdale, USA, 2007: 340–351.
[5]	LEE B C and BROOKS D. Applied inference: Case studies in microarchitectural design[J]. ACM Transactions on Architecture and Code Optimization, 2010, 7(2): 1–37.
[6]	PALERMO G, SILVANO C, and ZACCARIA V. ReSPIR: A response surface-based pareto iterative refinement for application-specific design space exploration[J]. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2009, 28(12): 1816–1829. doi: 10.1109/TCAD.2009.2028681
[7]	CHEN Tianshi, GUO Qi, TANG Ke, et al. ArchRanker: A ranking approach to design space exploration[C]. The 41st ACM/IEEE International Symposium on Computer Architecture (ISCA), Minneapolis, USA, 2014: 85–96.
[8]	CHEN Tianshi, CHEN Yunji, GUO Qi, et al. Effective and efficient microprocessor design space exploration using unlabeled design configurations[J]. ACM Transactions on Intelligent Systems, 2014, 5(1): 20.
[9]	LI W, ZENG Xiaoyang, NAN Longmei, et al. A reconfigurable block cryptographic processor based on VLIW architecture[J]. China Communications, 2016, 13(1): 91–99. doi: 10.1109/CC.2016.7405707
[10]	孟沫舒. CISC处理器复杂指令的实现方法研究[D]. [硕士论文], 中国科学院研究生院, 2012.
[11]	SHAN Weiwei, ZHANG Shuai, XU Jiaming, et al. Machine learning assisted side-channel-attack countermeasure and its application on a 28-nm AES circuit[J]. IEEE Journal of Solid-State Circuits, 2020, 55(3): 794–804. doi: 10.1109/JSSC.2019.2953855
[12]	LI Wei, CHANG Zhongxiang, FENG Xiao, et al. Fast parallel extract-shift and parallel deposit-shift in general-purpose processors[C]. The 12th IEEE International Conference on Ubiquitous Intelligence and Computing, Beijing, China, 2015: 764–771.
[13]	马超, 李伟, 戴紫彬, 等. 新型可重构移位-置换单元研究与设计[J]. 电子学报, 2017, 45(5): 1025–1034. doi: 10.3969/j.issn.0372-2112.2017.05.001 MA Chao, LI Wei, DAI Zibin, et al. A novel reconfigurable rotation-permutation unit research and implementation[J]. Acta Electronica Sinica, 2017, 45(5): 1025–1034. doi: 10.3969/j.issn.0372-2112.2017.05.001
[14]	马超, 戴紫彬, 李伟, 等. RPRU: 一种面向处理器的比特抽取与移位统一架构[J]. 计算机研究与发展, 2018, 55(2): 426–437. doi: 10.7544/issn1000-1239.2018.20160775 MA Chao, DAI Zibin, LI Wei, et al. RPRU: A unified architecture for rotation and bit-extraction operations in general-propose processor[J]. Journal of Computer Research and Development, 2018, 55(2): 426–437. doi: 10.7544/issn1000-1239.2018.20160775
[15]	徐光明, 徐金甫, 常忠祥, 等. 序列密码非线性反馈移存器的可重构研究[J]. 计算机应用研究, 2015, 32(9): 2823–2826. doi: 10.3969/j.issn.1001-3695.2015.09.062 XU Guangming, XU Jinfu, CHANG Zhongxiang, et al. Reconfigurability study on nonlinear feedback shift registers in stream cipher[J]. Application Research of Computers, 2015, 32(9): 2823–2826. doi: 10.3969/j.issn.1001-3695.2015.09.062
[16]	LIU Bin and BAAS B M. Parallel AES encryption engines for many-core processor arrays[J]. IEEE Transactions on Computers, 2013, 62(3): 536–547. doi: 10.1109/TC.2011.251
[17]	LI Wei, ZENG Xiaoyang, DAI Zibin, et al. A high energy-efficient reconfigurable VLIW symmetric cryptographic processor with loop buffer structure and chain processing mechanism[J]. Chinese Journal of Electronics, 2017, 26(6): 1161–1167. doi: 10.1049/cje.2017.06.010
[18]	SAYILAR G and CHIOU D. Cryptoraptor: High throughput reconfigurable cryptographic processor[C]. 2014 IEEE/ACM International Conference on Computer-Aided Design (ICCAD), San Jose, USA, 2014: 155–161.