高级搜索

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

基于反向编程策略的高并行高精度RRAM存算一体芯片

谢力凡 卫松涛 姚鹏 伍冬 唐建石 钱鹤 高滨 吴华强

谢力凡, 卫松涛, 姚鹏, 伍冬, 唐建石, 钱鹤, 高滨, 吴华强. 基于反向编程策略的高并行高精度RRAM存算一体芯片[J]. 电子与信息学报. doi: 10.11999/JEIT251174
引用本文: 谢力凡, 卫松涛, 姚鹏, 伍冬, 唐建石, 钱鹤, 高滨, 吴华强. 基于反向编程策略的高并行高精度RRAM存算一体芯片[J]. 电子与信息学报. doi: 10.11999/JEIT251174
XIE Lifan, WEI Songtao, YAO Peng, WU Dong, TANG Jianshi, QIAN He, GAO Bin, WU Huaqiang. A fast and accurate programming strategy for analog in-memory computing validated with a transposable RRAM macro and 0.64% fully-parallel RMS error[J]. Journal of Electronics & Information Technology. doi: 10.11999/JEIT251174
Citation: XIE Lifan, WEI Songtao, YAO Peng, WU Dong, TANG Jianshi, QIAN He, GAO Bin, WU Huaqiang. A fast and accurate programming strategy for analog in-memory computing validated with a transposable RRAM macro and 0.64% fully-parallel RMS error[J]. Journal of Electronics & Information Technology. doi: 10.11999/JEIT251174

基于反向编程策略的高并行高精度RRAM存算一体芯片

doi: 10.11999/JEIT251174 cstr: 32379.14.JEIT251174
基金项目: 国家自然科学基金(项目编号:92064001, 62025111)
详细信息
    作者简介:

    谢力凡:男,硕士,研究方向为存算一体电路设计

    卫松涛:男,博士,研究方向为存算一体电路设计

    姚鹏:男,副研究员,研究方向为存算一体芯片与系统

    伍冬:男,副研究员,研究方向为图像传感器和非挥发性存储器等阵列式电路系统设计技

    唐建石:男,副教授,研究方向为新型存储器件与集成工艺

    钱鹤:男,教授,研究方向为新型存储与计算技术

    高滨:男,教授,研究方向为新型存储与计算系统

    吴华强:男,教授,研究方向为新型存储与计算技术

    通讯作者:

    姚鹏 pyao@mail.tsinghua.edu.cn

  • 中图分类号: TP333.5

A fast and accurate programming strategy for analog in-memory computing validated with a transposable RRAM macro and 0.64% fully-parallel RMS error

Funds: NSFC (92064001, 62025111)
  • 摘要: 推理大模型等人工智能的发展需要高能效、高算力芯片RRAM(阻变随机存取存储器)存算一体技术可以克服传统架构的“存储墙”瓶颈,大幅降低数据搬移的开销,实现高速、低功耗智能计算。当前,RRAM存算一体技术缺乏适配计算的高速、高精度编程方法,传统编程策略面临单器件校验耗时长以及电路非理想因素带来的精度损失挑战。为了提升RRAM高并行度模拟存算一体(CIM)的编程速度并提高权重编程精度,本文提出一种新型系统化编程策略:利用双向矩阵向量乘法(MVM)检测映射故障,并引入基于权重冗余行的原位偏移补偿方案,以高效校准不同通道的偏移。基于上述策略,制备了包含640×256子阵列与双通道ADC的RRAM存算一体芯片。在4位输入、4位权重、8位输出的配置下,该宏单元实现了编程延迟降低4倍,且在全并行MVM计算中取得0.64%的最低均方根(RMS)误差,提出的编程方法在图像识别任务中将识别准确率分别提升了4.7%和4.8%。
  • 图  2  单器件校验忆阻器存算一体编程方案流程

    图  1  RRAM-CIM 结构

    图  3  BPAP流程图

    图  4  (a)正向计算和 (b) 反向计算下示意图

    图  5  AOSC阵列

    图  6  TC-ADC的结构和工作原理

    图  7  (a)芯片照片 (b)区域占比

    图  8  (a)MVM结果 (b)有无BPAP的标准差均方根差(c)编程延迟对比(d)有无AOSC偏移(e)网络精度

    图  9  芯片整体概述和性能

    表  1  与同类项目的对比

    方法本项目VLSI[15]ISSCC 2022[16]ISSCC 2023[17]
    工艺110nm14nm22nm22nm
    年份2024202120222023
    计算域电流电流电荷电流
    支持反向传播
    存算器件模拟RRAM模拟PCMSLC-PCMSLC/MLC RRAM
    输入4位电压串行脉冲位串行位串行
    并行度640256816-128
    输入 / 权重精度4/8884/8
    电源电压1.50.80.80.7-0.8
    阵列吞吐量(TOPS)1.365 | 0.3411.0080.0040.843 | 0.257
    阵列容量(K)160(640×256)252(256×1024)256(1024×256)1024(1024×1024)
    能效(TOPS / W)10.410 | 2.6032.48021.600241.800 | 67.200
    标准偏差标准偏差:1.16%σ标准偏差:1.94%σ
    均方根误差均方根误差:0.59%均方根误差:1~2%
    下载: 导出CSV
  • [1] XU Xiaowei, DING Yukun, HU S X, et al. Scaling for edge inference of deep neural networks[J]. Nature Electronics, 2018, 1(4): 216–222. doi: 10.1038/s41928-018-0059-3.
    [2] ALEC R, JEFFREY W, REWON C, et al. Language models are unsupervised multitask learners[Z]. Computer Science, Linguistics, 2019. (查阅网上资料, 未找到本条文献信息, 请确认).
    [3] XU Xiaowei, DING Yukun, HU S X, et al. Scaling for edge inference of deep neural networks[J]. Nature Electronics, 2018, 1(4): 216–222. doi: 10.1038/s41928-018-0059-3. (查阅网上资料,本条文献与第1条文献重复,请确认).
    [4] FUJIWARA H, MORI H, ZHAO Weichang, et al. A 3nm, 32.5TOPS/W, 55.0TOPS/mm2 and 3.78Mb/mm2 fully-digital compute-in-memory macro supporting INT12 × INT12 with a parallel-MAC architecture and foundry 6T-SRAM bit cell[C]. Proceedings of 2024 IEEE International Solid-State Circuits Conference, San Francisco, USA, 2024: 572–573. doi: 10.1109/ISSCC49657.2024.10454556.
    [5] GHOLAMI A, YAO Zhewei, KIM S, et al. AI and memory wall[J]. IEEE Micro, 2024, 44(3): 33–39. doi: 10.1109/MM.2024.3373763.
    [6] BÜCHEL J, VASILOPOULOS A, KERSTING B, et al. Gradient descent-based programming of analog in-memory computing cores[C]. Proceedings of 2022 International Electron Devices Meeting, San Francisco, USA, 2022: 779–782. doi: 10.1109/IEDM45625.2022.10019486.
    [7] JACOB B, KLIGYS S, CHEN Bo, et al. Quantization and training of neural networks for efficient integer-arithmetic-only inference[EB/OL]. https://doi.org/10.48550/arXiv.1712.05877, 2017.
    [8] RADHAKRISHNAN J, BELMONTE A, CLIMA S, et al. Improving post-cycling low resistance state retention in resistive RAM with combined oxygen vacancy and copper filament[J]. IEEE Electron Device Letters, 2019, 40(7): 1072–1075. doi: 10.1109/LED.2019.2917553.
    [9] SHIM W, MENG Jian, PENG Xiaochen, et al. Impact of multilevel retention characteristics on RRAM based DNN inference engine[C]. Proceedings of 2021 IEEE International Reliability Physics Symposium, Monterey, USA, 2021: 1–4. doi: 10.1109/IRPS46558.2021.9405210.
    [10] CHIU Y C, KHWA W S, LI C Y, et al. A 22nm 8Mb STT-MRAM near-memory-computing macro with 8b-precision and 46.4–160.1TOPS/W for edge-AI devices[C]. Proceedings of 2023 IEEE International Solid-State Circuits Conference, San Francisco, USA, 2023: 496–497. doi: 10.1109/ISSCC42615.2023.10067563.
    [11] YOU Deqi, KHWA W S, WU J J, et al. A 22nm nonvolatile AI-edge processor with 21.4TFLOPS/W using 47.25Mb lossless-compressed-computing STT-MRAM near-memory-compute macro[C]. Proceedings of 2024 IEEE Symposium on VLSI Technology and Circuits, Honolulu, USA, 2024: 1–2. doi: 10.1109/VLSITechnologyandCir46783.2024.10631408.
    [12] WANG Yang, YANG Xiaolong, QIN Yubin, et al. A 28nm 83.23TFLOPS/W POSIT-based compute-in-memory macro for high-accuracy AI applications[C]. Proceedings of 2024 IEEE International Solid-State Circuits Conference, San Francisco, USA, 2024: 566–567. doi: 10.1109/ISSCC49657.2024.10454567.
    [13] KHWA W S, WU Pingchun, WU J J, et al. A 16nm 96Kb integer/floating-point dual-mode-gain-cell-computing-in-memory macro achieving 73.3–163.3TOPS/W and 33.2–91.2TFLOPS/W for AI edge-devices[C]. Proceedings of 2024 IEEE International Solid-State Circuits Conference, San Francisco, USA, 2024: 568–569. doi: 10.1109/ISSCC49657.2024.10454447.
    [14] MORI H, ZHAO Weichang, LEE C E, et al. A 4nm 6163-TOPS/W/b 4790-TOPS/mm2/b SRAM based digital-computing-in-memory macro supporting bit-width flexibility and simultaneous MAC and weight update[C]. Proceedings of 2023 IEEE International Solid-State Circuits Conference, San Francisco, USA, 2023: 132–133. doi: 10.1109/ISSCC42615.2023.10067555.
    [15] KHADDAM-ALJAMEH R, STANISAVLJEVIC M, FORNT MAS J, et al. HERMES core – A 14nm CMOS and PCM-based in-memory compute core using an array of 300ps/LSB linearized CCO-based ADCs and local digital processing[C]. Proceedings of 2021 Symposium on VLSI Circuits, Kyoto, Japan, 2021: 1–2. doi: 10.23919/VLSICircuits52068.2021.9492362.
    [16] HUNG J M, HUANG Y H, HUANG S P, et al. An 8-Mb DC-current-free binary-to-8b precision ReRAM nonvolatile computing-in-memory macro using time-space-readout with 1286.4–21.6TOPS/W for edge-AI devices[C]. Proceedings of 2022 IEEE International Solid-State Circuits Conference, San Francisco, USA, 2022: 1–3. doi: 10.1109/ISSCC42614.2022.9731715.
    [17] HUANG W H, WEN Taihao, HUNG J M, et al. A nonvolatile Al-edge processor with 4MB SLC-MLC hybrid-mode ReRAM compute-in-memory macro and 51.4–251TOPS/W[C]. Proceedings of 2023 IEEE International Solid-State Circuits Conference, San Francisco, USA, 2023: 15–17. doi: 10.1109/ISSCC42615.2023.10067610.
  • 加载中
图(9) / 表(1)
计量
  • 文章访问数:  21
  • HTML全文浏览量:  9
  • PDF下载量:  2
  • 被引次数: 0
出版历程
  • 修回日期:  2026-02-06
  • 录用日期:  2026-02-06
  • 网络出版日期:  2026-02-28

目录

    /

    返回文章
    返回