高级搜索

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

格基后量子密码双域可重构多项式乘法运算单元架构研究

陈韬 赵旺鹏 别梦妮 李伟 南龙梅 杜怡然 付秋兴

陈韬, 赵旺鹏, 别梦妮, 李伟, 南龙梅, 杜怡然, 付秋兴. 格基后量子密码双域可重构多项式乘法运算单元架构研究[J]. 电子与信息学报. doi: 10.11999/JEIT250929
引用本文: 陈韬, 赵旺鹏, 别梦妮, 李伟, 南龙梅, 杜怡然, 付秋兴. 格基后量子密码双域可重构多项式乘法运算单元架构研究[J]. 电子与信息学报. doi: 10.11999/JEIT250929
CHEN Tao, ZHAO Wangpeng, BIE Mengni, LI Wei, NAN Longmei, DU Yiran, FU Qiuxing. Research on the Architecture of Dual-Field Reconfigurable Polynomial Multiplication Unit for Lattice-Based Post-Quantum Cryptography[J]. Journal of Electronics & Information Technology. doi: 10.11999/JEIT250929
Citation: CHEN Tao, ZHAO Wangpeng, BIE Mengni, LI Wei, NAN Longmei, DU Yiran, FU Qiuxing. Research on the Architecture of Dual-Field Reconfigurable Polynomial Multiplication Unit for Lattice-Based Post-Quantum Cryptography[J]. Journal of Electronics & Information Technology. doi: 10.11999/JEIT250929

格基后量子密码双域可重构多项式乘法运算单元架构研究

doi: 10.11999/JEIT250929 cstr: 32379.14.JEIT250929
基金项目: 国家自然科学基金(62302519)
详细信息
    作者简介:

    陈韬:男,副教授,研究方向为安全专用芯片设计、后量子公钥密码算法芯片设计

    赵旺鹏:男,硕士,研究方向为后量子公钥密码算法硬件实现

    别梦妮:女,博士,研究方向为后量子密码处理器设计

    李伟:男,教授,研究方向为密码处理器设计、ASIC专用芯片设计

    南龙梅:女,副教授,研究方向为规模集成电路设计、专用集成电路设计

    杜怡然:男,副教授,研究方向为可重构密码芯片设计

    付秋兴:男,博士,研究方向为全同态密码处理器设计

    通讯作者:

    赵旺鹏 fenghuwuyu2025@163.com

  • 中图分类号: TN402; TP309

Research on the Architecture of Dual-Field Reconfigurable Polynomial Multiplication Unit for Lattice-Based Post-Quantum Cryptography

Funds: The National Natural Science Foundation of China (62302519)
  • 摘要: 多项式乘法作为格密码算法的核心运算,通过快速数论变换(NTT)/快速傅里叶变换(FFT)设计成多项式乘法架构能够降低计算复杂度,提升格密码算法的运算速度。为支持多参数的多项式乘法运算,并提升多项式乘法的运算速度,该文提出一种双域可重构多项式乘法运算单元架构。该文首先基于Kyber、Dilithium和Falcon算法中的参数特征,提取多项式乘法的运算网络,并对内部双域乘法运算在算法层进行优化设计。其次,基于多项式乘法运算网络设计出双域可重构多项式乘法运算单元架构,并对双域可重构乘法单元进行优化设计以提升运算速度。最后,为提高运算单元架构的资源利用率,该文对多项式乘法运算单元架构进行并行度分析,当支持1路64 bit、2路32 bit或4路16 bit的运算时,多项式乘法运算架构的面积效率最高。该文在Xilinx FPGA xc7v2000tflg1925上进行实验验证,能够支持Kyber、Dilithium和Falcon算法中所有多项式乘法运算,工作频率达到169 MHz,面积时间积降低了50%以上。
  • 图  1  多项式乘法运算网络

    图  2  双域可重构多项式乘法运算单元整体架构

    图  3  双域可重构乘法单元

    图  4  双域乘法器结构图

    图  5  不同并行度下多项式乘法运算单元架构资源消耗对比

    图  6  不同并行设计的运算单元架构面积效率对比

    图  7  双域并行乘法器结构

    表  1  格基后量子密码多项式乘法参数分析

    算法系数类型位宽(bit)项数n模数q
    Kyber整数122563329≡1mod n
    Dilithium整数232568380417≡1mod 2n
    Falcon整数14512/102412289≡1mod 2n
    浮点数形式的复数64512/1024-
    下载: 导出CSV

    表  2  多项式乘法运算提取

    KyberDilithium/Falcon
    变换$ \left.\begin{array}{c}{a}^{\prime}=a+b\cdot \omega \\{b}^{\prime}=a-b\cdot \omega \end{array}\right\}\Rightarrow \begin{array}{c}a+b\cdot c\\a-b\cdot c\end{array} $$ \left.\begin{array}{c}{a}^{\prime}=a+b\cdot \omega \\{b}^{\prime}=a-b\cdot \omega \end{array}\right\}\Rightarrow \begin{array}{c}a+b\cdot c\\a-b\cdot c\end{array} $
    点乘$ \left.\begin{array}{c}{\hat{c}}_{2i}=({\hat{a}}_{2i}\cdot {\hat{b}}_{2i})+({\hat{a}}_{2i+1}\cdot {\hat{b}}_{2i+1})\cdot \omega \\{\hat{c}}_{2i+1}=({\hat{a}}_{2i}\cdot {\hat{b}}_{2i+1})+{\hat{a}}_{2i+1}\cdot {\hat{b}}_{2i}\end{array}\right\}\Rightarrow \begin{array}{c}b\cdot c\\a+d\cdot e\end{array} $$ {c}_{i}={a}_{i}\cdot {b}_{i}\Rightarrow b\cdot c $
    逆变换$ \left.\begin{array}{c}{a}^{\prime}=(a+b)/2\\{b}^{\prime}=\omega \cdot (a-b)/2\end{array}\right\}\Rightarrow \begin{array}{c}(a+b)/2\\\omega \cdot (a-b)/2\end{array} $$ \left.\begin{array}{c}{a}^{\prime}=(a+b)/2\\{b}^{\prime}=\omega \cdot (a-b)/2\end{array}\right\}\Rightarrow \begin{array}{c}(a+b)/2\\\omega \cdot (a-b)/2\end{array} $
    下载: 导出CSV
    算法1:Montgomery模乘 算法2:Barrett模乘 算法3:K-RED模乘
    输入:$ A,B,R,P,{P}^{\prime} $ 输入:$ a,b,m=\left\lfloor {2}^{k}/q\right\rfloor ;k\approx 2\left\lceil \log q\right\rceil $ 输入:$ a,b,q=k\times {2}^{m}+1 $
    输出:$ u=A\times B\times {R}^{-1}(\mathrm{mod}P) $ 输出:$ O=a\times b(\mathrm{mod}q) $ 输出:$ {C}^{\prime}=k\times a\times b\mathrm{mod}q $
    1: $ t=A\times B $ 1: $ O=a\times b $ 1: $ C=a\times b $
    2: $ s=(t+(t\times {P}^{\prime}(\mathrm{mod}R)\times P)))/R $ 2: $ t=(O\times m)\gg k $ 2: $ {C}_{l}=C\mathrm{mod}{2}^{m} $
    3: $ if(s \gt P) $ 3: $ O=O-(t\times q) $ 3: $ {C}_{h}=C\gg m $
    4:  $ s=s-P $ 4: $ if(O\geq q) $ 4: $ {C}^{\prime}=k{C}_{l}-{C}_{h} $
    5: $ returns $ 5:  $ O=O-q $ 5: $ return{C}^{\prime} $
    6: $ returnO $
    下载: 导出CSV

    4  双域可重构乘法单元

     输入:$ \mathrm{a}\_ \mathrm{r},\mathrm{a}\_ \mathrm{i},\mathrm{b}\_ \mathrm{r},\mathrm{b}\_ \mathrm{i},\mathrm{q},\mathrm{m},\mathrm{k},\mathrm{sel} $
     输出:$ \mathrm{c}\_ \mathrm{r},\mathrm{c}\_ \mathrm{i} $
     1: $ \mathrm{mul}\_ 1=\mathrm{mul}(\mathrm{a}\_ \mathrm{r},\mathrm{b}\_ \mathrm{r}) $
     2: $ \mathrm{add}\_ 1=\mathrm{add}(\mathrm{a}\_ \mathrm{r},\mathrm{a}\_ \mathrm{i}) $
     3: $ \mathrm{add}\_ 2=\mathrm{add}(\mathrm{b}\_ \mathrm{r},\mathrm{b}\_ \mathrm{i}) $
     /////////////sel=1,完成模乘运算;sel=0,完成复数乘法
     //////////////////
     4: $ \mathrm{if}(\mathrm{sel}=1) $
     5:  $ \mathrm{mul}\_ 2=\mathrm{mul}(\mathrm{mul}\_ 1[\mathrm{w}-1\colon 0],\mathrm{m}) $
     6:  $ \mathrm{mul}\_ 3=\mathrm{mul}(\mathrm{mul}\_ 1[2\mathrm{w}-1\colon \mathrm{w}],\mathrm{m}) $
     7: $ \mathrm{else} $
     8:  $ \mathrm{mul}\_ 2=\mathrm{mul}(\mathrm{a}\_ \mathrm{i},\mathrm{b}\_ \mathrm{i}) $
     9:  $ \mathrm{mul}\_ 3=\mathrm{mul}(\mathrm{add}\_ 1,\mathrm{add}\_ 2) $
     10: $ \mathrm{if}(\mathrm{sel}=1) $
     11: $ \mathrm{add}\_ 3=\mathrm{add}(\mathrm{mul}\_ 2,\mathrm{mul}\_ 3) $
     12: $ \mathrm{add}\_ {3}^{\prime}=\mathrm{add}\_ 3 \gt > \mathrm{k} $
     13: $ \mathrm{add}\_ {3}^{\prime\prime}=\mathrm{mul}\_ \mathrm{q}(\mathrm{add}\_ {3}^{\prime}) $
     14: $ \mathrm{else} $
     15: $ \mathrm{add}\_ 3=\mathrm{add}(\mathrm{mul}\_ 1,\mathrm{mul}\_ 2) $
     16: $ \mathrm{if}(\mathrm{sel}=1) $
     17: $ \mathrm{sub}\_ 1=\mathrm{sub}(\mathrm{mul}\_ 1,\mathrm{add}\_ {3}^{\prime\prime}) $
     18: $ \mathrm{sub}\_ 2=\mathrm{sub}(\mathrm{sub}\_ 1,\mathrm{q}) $
     19: $ \mathrm{if}(\mathrm{sub}\_ 1 \lt \mathrm{q}) $
     20:   $ \mathrm{c}\_ \mathrm{r}=\mathrm{sub}\_ 1;\mathrm{c}\_ \mathrm{i}=0 $
     21: $ \mathrm{else} $
     22:   $ \mathrm{c}\_ \mathrm{r}=\mathrm{sub}\_ 2;\mathrm{c}\_ \mathrm{i}=0 $
     23: $ \mathrm{else} $
     24: $ \mathrm{sub}\_ 1=\mathrm{sub}(\mathrm{mul}\_ 1,\mathrm{mul}\_ 2) $
     25: $ \mathrm{sub}\_ 2=\mathrm{sub}(\mathrm{mul}\_ 3,\mathrm{add}\_ 3) $
     26: $ \mathrm{c}\_ \mathrm{r}=\mathrm{sub}\_ 1;\mathrm{c}\_ \mathrm{i}=\mathrm{sub}\_ 2 $
     27:$ \mathrm{return}\mathrm{c}\_ \mathrm{r},\mathrm{c}\_ \mathrm{i} $
    下载: 导出CSV

    表  3  运算单元架构子模块FPGA资源对比

    硬件单元LUTFFDSP
    add59000
    sub47500
    div14300
    add_646400
    add_9628800
    mul_q3711280
    sub_6425600
    下载: 导出CSV

    表  4  多项式乘法单元性能对比

    文献频率
    (MHz)
    时钟
    周期
    时延
    (us)
    器件n系数位宽资源消耗
    (LUT/FF/DSP/BRAM)
    ATP算法
    电子学报2025[15]169204**0.4V725616bit18120/4000/64/134.79Kyber
    444**1.5651214bit75.72Falcon
    TCAS-I 2022[19]2241928.571A725613bit25674/3137/64/6317.22Kyber
    Dilithium
    TCAS-I
    2024[4]
    3346381.91A725613bit1052/1058/0/13.511.77Kyber
    TCAD
    2023[20]
    15414709.55V751214bit28616/4211/24/48473.94Falcon
    TCAS-I
    2023[7]
    1408676.19A725623bit11100/8700/32/4149.80Dilithium
    TCAS-II
    2025[13]
    362--US+102464bit11408/11634/60/20.50.097*FFT
    本文1698595.08V7256<16bit6522/3018/48/580.47Kyber
    188311.15512<16bit176.62Falcon
    169110.01256<32bit158.56Dilithium
    745144.11102464bit698.70
    (0.094*)
    FFT
    注:1.在上述FPGA型号中,US+为Zynq UltraScale+,V7为Virtex-7,A7为Artix-7;
    2.*表示处理单次运算的ATP值;
    3.**表示由于未查询到整个多项式乘法流程消耗的资源,因此将NTT消耗的时钟进行3倍处理;
    下载: 导出CSV
  • [1] BOS J, DUCAS L, KILTZ E, et al. CRYSTALS - Kyber: A CCA-secure module-lattice-based KEM[C]. Proceedings of 2018 IEEE European Symposium on Security and Privacy, London, UK, 2018: 353–367. doi: 10.1109/EuroSP.2018.00032.
    [2] DUCAS L, KILTZ E, LEPOINT T, et al. CRYSTALS-Dilithium: A lattice-based digital signature scheme[J]. IACR Transactions on Cryptographic Hardware and Embedded Systems, 2018, 2018(1): 238–268. doi: 10.13154/tches.v2018.i1.238-268.
    [3] FOUQUE P A, HOFFSTEIN J, KIRCHNER P, et al. Falcon: Fast-Fourier lattice-based compact signatures over NTRU[R]. Submission to the NIST’s Post-Quantum Cryptogr. Standardization Process, 2020. (查阅网上资料, 未找到本条文献出版信息, 请确认).
    [4] ALHASSANI A and BENAISSA M. High-speed polynomials multiplication HW accelerator for CRYSTALS-Kyber[J]. IEEE Transactions on Circuits and Systems I: Regular Papers, 2024, 71(12): 6105–6113. doi: 10.1109/TCSI.2024.3427011.
    [5] LI Aobo, LU Jiahao, LIU Dongsheng, et al. A 40nm 2.76μJ/Op energy-efficient secure post-quantum crypto-processor for crystals-Kyber on module-LWE[C]. Proceedings of 2023 IEEE Asian Solid-State Circuits Conference (A-SSCC), Haikou, China, 2023: 1–3. doi: 10.1109/A-SSCC58667.2023.10347915.
    [6] LI Xiang, LU Jiahao, LIU Dongsheng, et al. A high speed post-quantum crypto-processor for crystals-dilithium[J]. IEEE Transactions on Circuits and Systems II: Express Briefs, 2024, 71(1): 435–439. doi: 10.1109/TCSII.2023.3304416.
    [7] HU Xiao, TIAN Jing, LI Minghao, et al. AC-PM: An area-efficient and configurable polynomial multiplier for lattice based cryptography[J]. IEEE Transactions on Circuits and Systems I: Regular Papers, 2023, 70(2): 719–732. doi: 10.1109/TCSI.2022.3218192.
    [8] 陈韬, 李慧琴, 吴艾青, 等. 基于2KNTT的多项式乘法单元设计[J]. 电子学报, 2024, 52(2): 455–467. doi: 10.12263/DZXB.20220629.

    CHEN Tao, LI Huiqin, WU Aiqing, et al. A polynomial multiplier design based on 2KNTT[J]. Acta Electronica Sinica, 2024, 52(2): 455–467. doi: 10.12263/DZXB.20220629.
    [9] XU Tianyu, CUI Yijun, LIU Dongsheng, et al. Lightweight and efficient hardware implementation for saber using NTT multiplication[C]. Proceedings of 2022 IEEE Asia Pacific Conference on Circuits and Systems (APCCAS), Shenzhen, China, 2022: 601–605. doi: 10.1109/APCCAS55924.2022.10090310.
    [10] AIKATA A, MERT A C, IMRAN M, et al. KaLi: A crystal for post-quantum security using Kyber and dilithium[J]. IEEE Transactions on Circuits and Systems I: Regular Papers, 2023, 70(2): 747–758. doi: 10.1109/TCSI.2022.3219555.
    [11] LU Jiahao, ZHANG Jiaming, LUO Zhixiang, et al. An efficient and configurable hardware architecture of polynomial modular operation for CRYSTALS-Kyber and dilithium[C]. Proceedings of 2024 IEEE 67th International Midwest Symposium on Circuits and Systems (MWSCAS), Springfield, USA, 2024: 29–32. doi: 10.1109/MWSCAS60917.2024.10658892.
    [12] ZHU Yihong, ZHU Wenping, OUYANG Yi, et al. 16.2 A 28nm 69.4kOPS 4.4μJ/Op versatile post-quantum crypto-processor across multiple mathematical problems[C]. Proceedings of 2024 IEEE International Solid-State Circuits Conference (ISSCC), San Francisco, USA, 2024: 298–300. doi: 10.1109/ISSCC49657.2024.10454332.
    [13] WANG Jianfei, YANG Chen, HOU Jia, et al. A compact and efficient hardware accelerator for RNS-CKKS En/decoding and En/decryption[J]. IEEE Transactions on Circuits and Systems II: Express Briefs, 2025, 72(1): 243–247. doi: 10.1109/TCSII.2024.3454024.
    [14] 陈韬, 李慧琴, 李伟, 等. 面向格基后量子密码算法的可重构多项式乘法架构[J]. 电子与信息学报, 2023, 45(9): 3380–3392. doi: 10.11999/JEIT230284.

    CHEN Tao, LI Huiqin, LI Wei, et al. Reconfigurable polynomial multiplication architecture for lattice-based post-quantum cryptography algorithms[J]. Journal of Electronics & Information Technology, 2023, 45(9): 3380–3392. doi: 10.11999/JEIT230284.
    [15] 付秋兴, 李伟, 别梦妮, 等. 格基后量子密码的可重构NTT运算单元与高效调度算法研究[J]. 电子学报, 2025, 53(4): 1182–1191. doi: 10.12263/DZXB.20240788.

    FU Qiuxing, LI Wei, BIE Mengni, et al. Research on reconfigurable NTT arithmetic unit and efficient scheduling algorithm for lattice post-quantum cryptography[J]. Acta Electronica Sinica, 2025, 53(4): 1182–1191. doi: 10.12263/DZXB.20240788.
    [16] LYUBASHEVSKY V, MICCIANCIO D, PEIKERT C, et al. SWIFFT: A modest proposal for FFT hashing[C]. Proceedings of the 15th International Workshop on Fast Software Encryption, Lausanne, Switzerland, 2008: 54–72. doi: 10.1007/978-3-540-71039-4_4.
    [17] COOLEY J W and TUKEY J W. An algorithm for the machine calculation of complex Fourier series[J]. Mathematics of Computation, 1965, 19(90): 297–301. doi: 10.1090/S0025-5718-1965-0178586-1.
    [18] GENTLEMAN W M and SANDE G. Fast Fourier transforms: For fun and profit[C]. Proceedings of the November 7-10, 1966, Fall Joint Computer Conference, San Francisco, USA, 1966: 563–578. doi: 10.1145/1464291.1464352.
    [19] ZHAO Yifan, XIE Ruiqi, XIN Guozhu, et al. A high-performance domain-specific processor with matrix extension of RISC-V for module-LWE applications[J]. IEEE Transactions on Circuits and Systems I: Regular Papers, 2022, 69(7): 2871–2884. doi: 10.1109/TCSI.2022.3162593.
    [20] MU Jianan, REN Yi, WANG Wen, et al. Scalable and conflict-free NTT hardware accelerator design: Methodology, proof, and implementation[J]. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2023, 42(5): 1504–1517. doi: 10.1109/TCAD.2022.3205552.
    [21] LIAO Junhao, CHO K T, and SHIEH M D. A unified FFT/NTT design for efficient NTRU equation solving in FALCON cryptography[C]. Proceedings of 2025 IEEE International Symposium on Circuits and Systems (ISCAS), London, United Kingdom, 2025: 1–5. doi: 10.1109/ISCAS56072.2025.11043318.
    [22] PORNIN T. Improved key pair generation for falcon, BAT and hawk[J]. IACR Cryptol ePrint Arch, 2023, 2023: 290. (查阅网上资料, 未找到本条文献刊名和卷期页码信息, 请确认).
  • 加载中
图(7) / 表(6)
计量
  • 文章访问数:  18
  • HTML全文浏览量:  4
  • PDF下载量:  2
  • 被引次数: 0
出版历程
  • 收稿日期:  2025-09-16
  • 修回日期:  2026-01-27
  • 录用日期:  2026-01-27
  • 网络出版日期:  2026-02-12

目录

    /

    返回文章
    返回