高级搜索

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

STT-MRAM绝对差值原位计算驱动的轻量型AdderNet电路设计

王黎勋 张跃军 李琪康 张会红 温亮

王黎勋, 张跃军, 李琪康, 张会红, 温亮. STT-MRAM绝对差值原位计算驱动的轻量型AdderNet电路设计[J]. 电子与信息学报. doi: 10.11999/JEIT250627
引用本文: 王黎勋, 张跃军, 李琪康, 张会红, 温亮. STT-MRAM绝对差值原位计算驱动的轻量型AdderNet电路设计[J]. 电子与信息学报. doi: 10.11999/JEIT250627
WANG Lixun, ZHANG Yuejun, LI Qikang, ZHANG Huihong, WEN Liang. Lightweight AdderNet Circuit Enabled by STT-MRAM In-Memory Absolute Difference Computation[J]. Journal of Electronics & Information Technology. doi: 10.11999/JEIT250627
Citation: WANG Lixun, ZHANG Yuejun, LI Qikang, ZHANG Huihong, WEN Liang. Lightweight AdderNet Circuit Enabled by STT-MRAM In-Memory Absolute Difference Computation[J]. Journal of Electronics & Information Technology. doi: 10.11999/JEIT250627

STT-MRAM绝对差值原位计算驱动的轻量型AdderNet电路设计

doi: 10.11999/JEIT250627 cstr: 32379.14.JEIT250627
基金项目: 国家自然科学基金(62474100, 62174121, 62134002),浙江省“尖兵领雁+X”科技计划项目(2025C01063),宁波市科创甬江2035重点研发计划(2024Z139),慈溪市重点研发专项(CZ2025006),宁波大学研究生科研创新项目
详细信息
    作者简介:

    王黎勋:男,博士生,研究方向为低功耗存算一体电路设计及实现

    张跃军:男,教授,研究方向为低功耗、高信息密度集成电路理论和设计

    李琪康:男,博士生,研究方向为高效深度神经网络加速器设计

    张会红:女,副教授,研究方向为控制理论与应用、低功耗集成电路理论与优化设计

    温亮:男,博士,研究方向为低功耗存储器电路设计及实现

    通讯作者:

    张跃军 zhangyuejun@nbu.edu.cn

  • 中图分类号: TN403

Lightweight AdderNet Circuit Enabled by STT-MRAM In-Memory Absolute Difference Computation

Funds: The National Natural Science Foundation of China (62474100, 62174121, 62134002), “Vanguard Geese Leading and X” Science and Technology Program of Zhejiang Province (2025C01063), The Key R&D Program of Ningbo Science and Technology Yongjiang 2035 (2024Z139), The Key R&D Program of Cixi (CZ2025006), The Graduate Student Scientific Research and Innovation Project of Ningbo University
  • 摘要: 随着人工智能研究的不断深入,卷积神经网络(Convolutional Neural Networks, CNN)在资源受限环境中的部署需求不断上升。然而,受限于冯诺依曼架构,CNN加速器随着部署模型深度增加,卷积核逐层堆叠所引发的乘累加运算呈现超线性增长趋势。为此,该文提出一种基于自旋转移矩磁性随机存储器(Spin Transfer Torque-Magnetoresistive Random Access Memory, STT-MRAM)的轻量型加法神经网络(AdderNet)加速电路设计方案。该方案首先将L1范数引入存算一体架构,提出STT-MRAM绝对差值原位计算方法,以轻量级加法取代乘累加运算;其次,设计基于磁阻状态映射的可配置全加器,结合稀疏优化策略,跳过零值参与的冗余逻辑判断;最后,进一步构建支持单周期进位链更新的并行全加器阵列,实现高效的卷积核映射与多核L1范数并行计算。实验结果显示,在CIFAR-10数据集上,该加速器实现90.66%的识别准确率,仅较软件模型下降1.18%,同时在133 MHz频率下达到32.31 GOPS的最大吞吐量与494.56 GOPS/W的峰值能效。
  • 图  1  STT-MRAM器件及其连接结构图

    图  2  基于STT-MRAM的全加器单元

    图  3  STT-MRAM全加器单元计算过程

    图  4  基于STT-MRAM的全加器阵列结构

    图  5  AdderNet算法的硬件架构及实现

    图  6  STT-MRAM阻态分布图

    图  7  进位Zi+1计算时序波形图

    图  8  和位S计算时序波形图

    图  9  AdderNet的准确率与损失变化曲线

    图  10  加法器读电路的蒙特卡罗仿真

    图  11  与基准模型的混淆矩阵对比

    表  1  多层感知机操作数统计

    层别总操作数“0”操作数占比(%)
    输入层--隐藏层100325000086854804886.57
    隐藏层--输出层12800000679503053.08
    输入层--输出层101632000087561307886.16
    下载: 导出CSV

    表  2  与相关文献比较结果

    i5-1235U
    CPU[14]
    i7-12700H
    CPU[14]
    i5-1235U
    iGPU[14]
    i7-12700H
    iGPU[14]
    ICCAD
    2022[15]
    TCAS-I
    2024[13]
    TCAS-I
    2025[16]
    本文
    工艺(nm) 10 10 10 10 N/A 65 N/A 40
    电压(V) 1.1-1.3 1.1-1.3 0.9-1.1 0.9-1.1 12(FPGA) 1.2 12(FPGA) 0.6-1.1
    单元结构 N/A N/A N/A N/A LUT 2T1M LUT 3T3M
    频率(MHz) 4400 4700 1200 1400 200 117 200 133
    位宽(bit) 8b 8b 8b 8b 8b 8b 6b 8b
    功耗(mW) 5500 11500 1200 1650 1695 63.82 381 65.33
    网络模型 ResNet-50 ResNet-50 ResNet-50 ResNet-50 ResNet-20 VGG-8 ResNet-20 ResNet-20
    数据集 N/A N/A N/A N/A CIFAR-10 CIFAR-10 CIFAR-10 CIFAR-10
    准确率(%) N/A N/A N/A N/A 89.9% 93.72% 90.52% 90.66%
    最大吞吐量
    (Gops)
    151.18 424 200.32 360.49 214.6 20.93 12.08/kLUT 32.31
    能效
    (Gops/W)
    27.48 2.6 166.9 218.48 126.6 246.68 562.5 494.56
    下载: 导出CSV
  • [1] QI Haoran, QIU Yuwei, LUO Xing, et al. An efficient latent style guided transformer-CNN framework for face super-resolution[J]. IEEE Transactions on Multimedia, 2024, 26: 1589–1599. doi: 10.1109/TMM.2023.3283856.
    [2] 陈晓雷, 王兴, 张学功, 等. 面向360度全景图像显著目标检测的相邻协调网络[J]. 电子与信息学报, 2024, 46(12): 4529–4541. doi: 10.11999/JEIT240502.

    CHEN Xiaolei, WANG Xing, ZHANG Xuegong, et al. Adjacent coordination network for salient object detection in 360 degree omnidirectional images[J]. Journal of Electronics & Information Technology, 2024, 46(12): 4529–4541. doi: 10.11999/JEIT240502.
    [3] 李沫谦, 杨陟卓, 李茹, 等. 基于多尺度卷积的阅读理解候选句抽取[J]. 中文信息学报, 2024, 38(8): 128–139,157. doi: 10.3969/j.issn.1003-0077.2024.08.015.

    LI Moqian, YANG Zhizhuo, LI Ru, et al. Evidence sentence extraction for reading comprehension based on multi-scale convolution[J]. Journal of Chinese Information Processing, 2024, 38(8): 128–139,157. doi: 10.3969/j.issn.1003-0077.2024.08.015.
    [4] LU Ye, XIE Kunpeng, XU Guanbin, et al. MTFC: A multi-GPU training framework for cube-CNN-based hyperspectral image classification[J]. IEEE Transactions on Emerging Topics in Computing, 2021, 9(4): 1738–1752. doi: 10.1109/TETC.2020.3016978.
    [5] HONG H, CHOI D, KIM N, et al. Mobile-X: Dedicated FPGA implementation of the MobileNet accelerator optimizing depthwise separable convolution[J]. IEEE Transactions on Circuits and Systems II: Express Briefs, 2024, 71(11): 4668–4672. doi: 10.1109/TCSII.2024.3440884.
    [6] MUN H G, MOON S, KIM B, et al. Bottleneck-stationary compact model accelerator with reduced requirement on memory bandwidth for edge applications[J]. IEEE Transactions on Circuits and Systems I: Regular Papers, 2023, 70(2): 772–782. doi: 10.1109/TCSI.2022.3222862.
    [7] WANG Tianyu, SHEN Zhaoyan, and SHAO Zili. CNN acceleration with joint optimization of practical PIM and GPU on embedded devices[C]. IEEE 40th International Conference on Computer Design (ICCD), Olympic Valley, CA, USA, 2022: 377–384. doi: 10.1109/ICCD56317.2022.00062.
    [8] BLOTT M, PREUßER T B, FRASER N J, et al. FINN-R: An end-to-end deep-learning framework for fast exploration of quantized neural networks[J]. ACM Transactions on Reconfigurable Technology and Systems, 2018, 11(3): 16. doi: 10.1145/3242897.
    [9] CONTI F, PAULIN G, GAROFALO A, et al. Marsellus: A heterogeneous RISC-V AI-IoT end-node SoC with 2–8 b DNN acceleration and 30%-boost adaptive body biasing[J]. IEEE Journal of Solid-State Circuits, 2024, 59(1): 128–142. doi: 10.1109/JSSC.2023.3318301.
    [10] CHEN Hanting, WANG Yunhe, XU Chunjing, et al. AdderNet: Do we really need multiplications in deep learning?[C]. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, USA, 2020: 1465–1474. doi: 10.1109/CVPR42600.2020.00154.
    [11] ZHANG Heng, HE Sunan, LU Xin, et al. SSM-CIM: An efficient CIM macro featuring single-step multi-bit MAC computation for CNN edge inference[J]. IEEE Transactions on Circuits and Systems I: Regular Papers, 2023, 70(11): 4357–4368. doi: 10.1109/TCSI.2023.3301814.
    [12] 永若雪, 姜岩峰. 关于3D堆叠MRAM热学分析方法的研究[J]. 电子学报, 2023, 51(10): 2775–2782. doi: 10.12263/DZXB.20220275.

    YONG Ruoxue and JIANG Yanfeng. Research on thermal analysis method of 3D-stacked MRAM[J]. Acta Electronica Sinica, 2023, 51(10): 2775–2782. doi: 10.12263/DZXB.20220275.
    [13] LUO Lichuan, DENG Erya, LIU Dijun, et al. CiTST-AdderNets: Computing in toggle spin torques MRAM for energy-efficient AdderNets[J]. IEEE Transactions on Circuits and Systems I: Regular Papers, 2024, 71(3): 1130–1143. doi: 10.1109/TCSI.2023.3343081.
    [14] OpenVINO. OpenVINO 2025.3[EB/OL]. https://docs.openvino.ai/2025/index.html, 2025.
    [15] ZHANG Yunxiang, SUN Biao, JIANG Weixiong, et al. WSQ-AdderNet: Efficient weight standardization based quantized AdderNet FPGA accelerator design with high-density INT8 DSP-LUT co-packing optimization[C]. IEEE/ACM International Conference on Computer Aided Design (ICCAD), San Diego, USA, 2022: 1–9.
    [16] ZHANG Yunxiang, AL KAILANI O, ZHOU Bin, et al. AdderNet 2.0: Optimal addernet accelerator designs with activation-oriented quantization and fused bias removal-based memory optimization[J]. IEEE Transactions on Circuits and Systems I: Regular Papers, 2025. doi: 10.1109/TCSI.2025.3539912.
  • 加载中
图(11) / 表(2)
计量
  • 文章访问数:  17
  • HTML全文浏览量:  12
  • PDF下载量:  0
  • 被引次数: 0
出版历程
  • 收稿日期:  2025-04-28
  • 修回日期:  2025-09-17
  • 网络出版日期:  2025-09-19

目录

    /

    返回文章
    返回