高级搜索

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

基于比特重组快速模约简的高面积效率椭圆曲线标量乘法器设计

刘志伟 张琦 黄海 杨晓秋 陈冠百 赵石磊 于斌

刘志伟, 张琦, 黄海, 杨晓秋, 陈冠百, 赵石磊, 于斌. 基于比特重组快速模约简的高面积效率椭圆曲线标量乘法器设计[J]. 电子与信息学报, 2024, 46(1): 344-352. doi: 10.11999/JEIT221446
引用本文: 刘志伟, 张琦, 黄海, 杨晓秋, 陈冠百, 赵石磊, 于斌. 基于比特重组快速模约简的高面积效率椭圆曲线标量乘法器设计[J]. 电子与信息学报, 2024, 46(1): 344-352. doi: 10.11999/JEIT221446
LIU Zhiwei, ZHANG Qi, HUANG Hai, YANG Xiaoqiu, CHEN Guanbai, ZHAO Shilei, YU Bin. Design of High Area Efficiency Elliptic Curve Scalar Multiplier Based on Fast Modulo Reduction of Bit Reorganization[J]. Journal of Electronics & Information Technology, 2024, 46(1): 344-352. doi: 10.11999/JEIT221446
Citation: LIU Zhiwei, ZHANG Qi, HUANG Hai, YANG Xiaoqiu, CHEN Guanbai, ZHAO Shilei, YU Bin. Design of High Area Efficiency Elliptic Curve Scalar Multiplier Based on Fast Modulo Reduction of Bit Reorganization[J]. Journal of Electronics & Information Technology, 2024, 46(1): 344-352. doi: 10.11999/JEIT221446

基于比特重组快速模约简的高面积效率椭圆曲线标量乘法器设计

doi: 10.11999/JEIT221446
基金项目: 国家重点研发计划重点专项(2018YFB2202101),中央引导地方科技发展专项(ZY20B11),黑龙江省普通高校基本科研业务费专项资金(2019KYYWF0214)
详细信息
    作者简介:

    刘志伟:男,讲师、博士生,研究方向为可重构计算、密码芯片设计、硬件安全、SoC芯片设计等

    张琦:男,硕士生,研究方向为信息安全、集成电路设计等

    黄海:男,教授,博士生导师,研究方向为信息安全、可重构技术、数字集成电路设计等

    杨晓秋:女,硕士生,研究方向为信息安全、集成电路设计等

    陈冠百:男,硕士生,研究方向为信息安全、集成电路设计等

    赵石磊:男,副教授,硕士生导师,研究方向为信息安全、密码芯片设计、密码芯片的安全设计等

    于斌:男,讲师,研究方向为密码算法、密码芯片设计、数字集成电路设计等

    通讯作者:

    黄海 ic@hrbust.edu.cn

  • 中图分类号: TN918

Design of High Area Efficiency Elliptic Curve Scalar Multiplier Based on Fast Modulo Reduction of Bit Reorganization

Funds: The National Key R&D Program of China (2018YFB2202101), Special Projects for the Central Government to Guide the Development of Local Science and Technology (ZY20B11), Fundamental Research Foundation for of Heilongjiang Province (2019KYYWF0214)
  • 摘要: 针对现有椭圆曲线密码标量乘法器难以兼顾灵活性和面积效率的问题,该文设计了一种基于比特重组快速模约简的高面积效率标量乘法器。首先,根据椭圆曲线标量乘的运算特点,设计了一种可实现乘法和模逆两种运算的硬件复用运算单元以提高硬件资源使用率,并采用Karatsuba-Ofman算法提高计算性能。其次,设计了基于比特重组的快速模约简算法,并实现了支持secp256k1, secp256r1和SCA-256(SM2标准推荐曲线)快速模约简计算的硬件架构。最后,对点加和倍点的模运算操作调度进行了优化,提高乘法与快速模约简的利用率,降低了标量乘计算所需的周期数量。所设计的标量乘法器在55 nm CMOS工艺下需要275 k个等效门,标量乘运算速度为48309次/s,面积时间积达到5.7。
  • 图  1  资源共享的运算单元

    图  2  兼容3条曲线快速模约简结构

    图  3  高速点加和倍点的模运算操作调度

    图  4  标量乘法器

    算法1 NAF标量乘算法
     输入:$ k,P $
     输出:$ Q = kP $
     步骤1:计算${\text{NAF} }\left( k \right) = \left\{ { {k_{m - 1} }, \cdots ,{k_0} } \right\}$
     步骤2:$ Q \leftarrow \infty $
     步骤3:$ {\text{for }}i \leftarrow m - 1{\text{ down to }}0{\text{ do}} $
         $ Q \leftarrow 2Q $
          $ {\text{if }}{k_i} = 1{\text{ then}} $ $ Q \leftarrow Q + P $
          $ {\text{if }}{k_i} = - 1{\text{ then}} $ $ Q \leftarrow Q - P $
     步骤4:返回$ Q $
    下载: 导出CSV
    算法2 改进的Karatsuba算法操作调度
     输入:$ A={a}_{1}{2}^{128}+{a}_{0},\;B={b}_{1}{2}^{128}+{b}_{0} $
     输出:$ C = A \cdot B $
     1:$ s = {a_1}\cdot{b_1} $;$ v\left[ {128:0} \right] = {\text{ }}{a_0} + {a_1} $;
       $ v\left[ {257:129} \right] = {b_0} + {b_1} $;
     2:$ r = {a_0}\cdot{b_0} $;$ {c_1} = r + s $;$u = {\text{~}}{c_1} + 1$;
     3:$ v = v\left[ {128:0} \right] \cdot v\left[ {257:129} \right] $;$ {c_1} = v + u $;
       $ {c_2} = \left\{ {s,r\left[ {255:128} \right]} \right\} + {c_1} $;
     4:返回$ C = \{ {c_2},r\left[ {127:0} \right]\} $
    下载: 导出CSV
    算法3 secp256k1快速模约简算法
     输入:$ C = {c_{127}}{2^{508}} + {c_{126}}{2^{504}} + \cdots + {c_1}{2^4} + {c_0} $, $0 \le C < {p^2}$
     输出:$ C\text{mod} p $
     (1) $ {s_1} = ({c_{63}}, \cdots ,{c_0}) $;
       $ {s_2} = ({c_{127}}, \cdots ,{c_{64}}) $;
       $ {s_3} = ({c_{126}}, \cdots ,{c_{64}},{c_{127}}) $;
       $ {s_4} = (2'h0,{c_{124}}, \cdots ,{c_{64}},2'h0,{c_{127}},4'h0) $;
       $ {s_5} = ({c_{119}}, \cdots ,{c_{64}},{c_{127}}, \cdots ,{c_{120}}) $;
       $ {s_6} = (214'h0,{c_{127}},2'h0,{c_{127}}, \cdots ,{c_{120}},4'h0) $;
       $ {s_7} = (214'h0,{c_{127}}, \cdots ,{c_{120}},{c_{126}},{c_{126}},2'h0) $;
       $ {s_8} = (192'h0,{c_{127}}, \cdots ,{c_{120}},12'h0,{c_{127}},2'h0,$
          ${c_{127}},10'h0) $;
       $ {s_9} = (218'h0,{c_{126}},18'h0,{c_{126}},{c_{127}},8'h0) $;
       $ {s_{10}} = (220'h0,{c_{127}},18'h0,{c_{127}},10'h0) $;
       $ {s_{11}} = ({c_{125}},254'h0) $;
       $ {s_{12}} = (2'h0,{c_{125}}, \cdots ,{c_{64}},{c_{127}},2'h0) $;
       $ {s_{13}} = (218'h0,{c_{127}}, \cdots ,{c_{120}},6'h0) $;
       $ {s_{14}} = (218'h0,{c_{127}},18'h0,{c_{127}},2'h0,{c_{127}},6'h0) $;
       $ {s_{15}} = (240'h0,{c_{127}},{c_{126}},8'h0) $;
       $ {s_{16}} = ({c_{126}},254'h0) $;
       $ {{\text{z}}_1} = {s_1} + {s_2} + {s_3} + {s_4} + {s_5} + {s_6} + {s_7} + {s_8} + {s_9} + {s_{10}} + $
       ${s_{11}} - {s_{12}} - {s_{13}} - {s_{14}} - {s_{15}} - {s_{16}} $;
       $ = ({r_{65}}, \cdots ,{r_0}) $
     (2) $ {t_1} = ({r_{63}}, \ldots ,{r_0}) $;
       $ {t_2} = \left( {216'h0,{\text{ }}{r_{65}},{r_{64}},20'h0,{\text{ }}{r_{65}},{r_{65}},{r_{64}}} \right) $;
       $ {t_3} = \left( {235'h0,{\text{ }}{r_{65}},{r_{64}},2'h0,{\text{ }}{r_{64}},4'h0} \right) $;
       $ {t_4} = \left( {242'h0,{\text{ }}{r_{65}},{r_{64}},6'h0} \right) $;
       $ {z_2} = {\text{ }}{t_1} + {t_2} + {t_3} - {t_4} $
     (3)若$({z_1} \ge 0)$,返回$ \left( {{z_2}\text{mod} p} \right) $;
       若$ ( - p < {z_1} < 0) $,返回$ \left( {{z_1} + p} \right) $;
       若$( - 2p < {z_1} \le - p)$,返回$ \left( {{z_1} + 2p} \right) $;
       若$( - 3p < {z_1} \le- 2p)$,返回$ \left( {{z_1} + 3p} \right) $;
    下载: 导出CSV

    表  1  主要模块面积

    模块NAF编码器资源共享
    运算单元
    模约简标量乘法器
    面积(K·Gates/LUTs)1615145275
    下载: 导出CSV

    表  2  椭圆曲线操作周期

    乘法模约简模加/减模逆NAF编码倍点倍点点加标量乘
    周期331360256276010366
    下载: 导出CSV

    表  3  256位素数域下椭圆曲线标量乘法器的性能比较

    方案工艺(nm)曲线主频(MHz)周期面积(K·Gates/LUTs)单次时间(μs)吞吐量(次/s)AT
    本文55secp256k1/r1, SCA-2565001036627520.7483095.7
    文献[22]55secp256r1300795530627377128.1
    文献[9]552561364582001891450690274
    文献[10]65256546.53973004477301370326.3
    文献[11]65secp256r11882300350012.58000043.7
    本文130secp256k1/r1, SCA-2562431036631442.52352913.3
    文献[6]130SCA-25625014240280571754415.9
    文献[12]13025615061000057.14070246232
    文献[14]130SCA-25621445500208211474044.2
    文献[13]180secp256r118536800144.8711408428.9
    本文Virtex-7secp256k1/r1, SCA-256371036647.4279358413.2
    文献[24]Virtex-7256177.7262.732.7147167948.5
    文献[25]Virtex-725672.9215.996.92957338286.8
    下载: 导出CSV
  • [1] ANSI. ANSI X9.63 Public key cryptography for the financial services industry: Elliptic curve key agreement and key transport protocols[S]. 2000.
    [2] BARKER E. Digital signature standard[R]. Gaithersburg: National Institute of Standards and Technology, 2000.
    [3] Institute of Electrical and Electronic Engineers. 1363-2000 IEEE standard specifications for public-key cryptography[S]. IEEE, 2000.
    [4] JAVEED K, SAEED K, and GREGG D. High-speed parallel reconfigurable Fp multipliers for elliptic curve cryptography applications[J]. International Journal of Circuit Theory and Applications, 2022, 50(4): 1160–1173. doi: 10.1002/cta.3206
    [5] WANG Huaqun, HE Debiao, and JI Yimu. Designated-verifier proof of assets for bitcoin exchange using elliptic curve cryptography[J]. Future Generation Computer Systems, 2020, 107: 854–862. doi: 10.1016/j.future.2017.06.028
    [6] HU Xianghong, ZHENG Xin, ZHANG Shengshi, et al. A high-performance elliptic curve cryptographic processor of SM2 over GF(p)[J]. Electronics, 2019, 8(4): 431. doi: 10.3390/electronics8040431
    [7] HUANG Hai, NA Ning, XING Lin, et al. An improved wNAF Scalar-multiplication algorithm with low computational complexity by using prime precomputation[J]. IEEE Access, 2021, 9: 31546–31552. doi: 10.1109/ACCESS.2021.3061124
    [8] 赵石磊, 杨晓秋, 刘志伟, 等. 一种低复杂度的改进wNAF标量乘算法[J]. 电子学报, 2022, 50(4): 977–983. doi: 10.12263/DZXB.20211016

    ZHAO Shilei, YANG Xiaoqiu, LIU Zhiwei, et al. An Improved wNAF Scalar-Multiplication Algorithm With Low Computational Complexity[J]. Acta Electronica Sinica, 2022, 50(4): 977–983. doi: 10.12263/DZXB.20211016
    [9] LIU Zilong, LIU Dongsheng, and ZOU Xuecheng. An efficient and flexible hardware implementation of the dual-field elliptic curve cryptographic processor[J]. IEEE Transactions on Industrial Electronics, 2017, 64(3): 2353–2362. doi: 10.1109/TIE.2016.2625241
    [10] HOSSAIN M S, KONG Yinan, SAEEDI E, et al. High-performance elliptic curve cryptography processor over NIST prime fields[J]. IET Computers & Digital Techniques, 2017, 11(1): 33–42. doi: 10.1049/iet-cdt.2016.0033
    [11] LIU Jianwei, CHENG Dongxu, GUAN Zhenyu, et al. A high speed VLSI implementation of 256-bit scalar point multiplier for ECC over GF(p)[C]. Proceedings of 2018 IEEE International Conference on Intelligence and Safety for Robotics, Shenyang, China, 2018: 184–191.
    [12] HU Xianghong, ZHENG Xin, ZHANG Shengshi, et al. A low hardware consumption elliptic curve cryptographic architecture over GF(p) in embedded application[J]. Electronics, 2018, 7(7): 104. doi: 10.3390/electronics7070104
    [13] CHOI P, LEE M K, KIM J H, et al. Low-complexity elliptic curve cryptography processor based on configurable partial modular reduction over NIST prime fields[J]. IEEE Transactions on Circuits and Systems II:Express Briefs, 2018, 65(11): 1703–1707. doi: 10.1109/TCSII.2017.2756680
    [14] ZHANG Dan and BAI Guoqiang. Ultra high-performance ASIC implementation of SM2 with power-analysis resistance[C]. Proceedings of 2015 IEEE International Conference on Electron Devices and Solid-State Circuits (EDSSC), Singapore, 2015: 523–526.
    [15] ISLAM M M, HOSSAIN M S, SHAHJALAL M, et al. Area-time efficient hardware implementation of modular multiplication for elliptic curve cryptography[J]. IEEE Access, 2020, 8: 73898–73906. doi: 10.1109/ACCESS.2020.2988379
    [16] HANKERSON D, VANSTONE S, and MENEZES A. Guide to Elliptic Curve Cryptography[M]. New York: Springer, 2004: 312.
    [17] LUCCA A V, SBORZ G A M, LEITHARDT V R Q, et al. A review of techniques for implementing elliptic curve point multiplication on hardware[J]. Journal of Sensor and Actuator Networks, 2021, 10(1): 3. doi: 10.3390/jsan10010003
    [18] 王敏, 吴震. 抗SPA攻击的椭圆曲线NAF标量乘实现算法[J]. 通信学报, 2012, 33(S1): 228–232.

    WANG Min and WU Zhen. Algorithm of NAF scalar multiplication on ECC against SPA[J]. Journal on Communications, 2012, 33(S1): 228–232.
    [19] SOLINAS J A. Efficient arithmetic on Koblitz curves[J]. Designs, Codes and Cryptography, 2000, 19(2): 195–249. doi: 10.1023/A:1008306223194
    [20] KARATSUBA A. Multiplication of multidigit numbers on automata[J]. Soviet Physics Doklady, 1963, 7: 595–596.
    [21] CHOI P, LEE M K, KONG J T, et al. Efficient design and performance analysis of a hardware right-shift binary modular inversion algorithm in GF(p)[J]. Journal of Semiconductor Technology and Science, 2017, 17(3): 425–437. doi: 10.5573/JSTS.2017.17.3.425
    [22] HU Xianghong, LI Xueming, ZHENG Xin, et al. A high speed processor for elliptic curve cryptography over NIST prime field[J]. IET Circuits, Devices & Systems, 2022, 16(4): 350–359. doi: 10.1049/CDS2.12110
    [23] 于斌, 黄海, 刘志伟, 等. 面向多椭圆曲线的高速标量乘法器设计与实现[J]. 通信学报, 2020, 41(12): 100–109. doi: 10.11959/j.issn.1000-463X.2020226

    YU Bin, HUANG Hai, LIU Zhiwei, et al. Design and implementation of high-speed scalar multiplier for multi-elliptic curve[J]. Journal on Communications, 2020, 41(12): 100–109. doi: 10.11959/j.issn.1000-463X.2020226
    [24] ISLAM M M, HOSSAIN M S, HASAN M K, et al. FPGA implementation of high-speed area-efficient processor for elliptic curve point multiplication over prime field[J]. IEEE Access, 2019, 7: 178811–178826. doi: 10.1109/ACCESS.2019.2958491
    [25] ASIF S, HOSSAIN M S, and KONG Yinan. High-throughput multi-key elliptic curve cryptosystem based on residue number system[J]. IET Computers & Digital Techniques, 2017, 11(5): 165–172. doi: 10.1049/iet-cdt.2016.0141
  • 加载中
图(4) / 表(6)
计量
  • 文章访问数:  196
  • HTML全文浏览量:  100
  • PDF下载量:  11
  • 被引次数: 0
出版历程
  • 收稿日期:  2022-11-17
  • 修回日期:  2023-06-15
  • 网络出版日期:  2023-06-22
  • 刊出日期:  2024-01-17

目录

    /

    返回文章
    返回