高级搜索

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

存内计算芯片研究进展及应用

郭昕婕 王光燿 王绍迪

郭昕婕, 王光燿, 王绍迪. 存内计算芯片研究进展及应用[J]. 电子与信息学报, 2023, 45(5): 1888-1898. doi: 10.11999/JEIT220420
引用本文: 郭昕婕, 王光燿, 王绍迪. 存内计算芯片研究进展及应用[J]. 电子与信息学报, 2023, 45(5): 1888-1898. doi: 10.11999/JEIT220420
GUO Xinjie, WANG Guangyao, WANG Shaodi. Technology Developments and Applications of In-memory Computing Processors[J]. Journal of Electronics & Information Technology, 2023, 45(5): 1888-1898. doi: 10.11999/JEIT220420
Citation: GUO Xinjie, WANG Guangyao, WANG Shaodi. Technology Developments and Applications of In-memory Computing Processors[J]. Journal of Electronics & Information Technology, 2023, 45(5): 1888-1898. doi: 10.11999/JEIT220420

存内计算芯片研究进展及应用

doi: 10.11999/JEIT220420
基金项目: 科技部“科技助力经济2020”重点专项项目(SQ2020YFF0404823)
详细信息
    作者简介:

    郭昕婕:女,博士,研究方向为存算一体芯片设计

    王光燿:男,硕士生,研究方向为存算一体芯片设计

    王绍迪:男,博士,研究方向为存储器及存算一体架构设计

    通讯作者:

    王绍迪 shaodi.wang@witintech.com

  • 中图分类号: TN4

Technology Developments and Applications of In-memory Computing Processors

Funds: The Ministry of Science and Technology's Key Special Project (SQ2020YFF0404823)
  • 摘要: 随着数据快速增长,冯诺依曼架构内存墙成为计算性能进一步提升的关键瓶颈。新型存算一体架构(包括存内计算(IMC)架构与近存计算(NMC)架构),有望打破冯诺依曼架构瓶颈,大幅提高算力和能效。该文介绍了存算一体芯片的发展历程、研究现状以及基于各类存储器介质(如传统存储器DRAM, SRAM和Flash和新型非易失性存储器ReRAM, PCM, MRAM, FeFET等)的存内计算基本原理、优势与面临的问题。然后,以知存科技WTM2101量产芯片为例,重点介绍了存算一体芯片的电路结构与应用现状。最后,分析了存算一体芯片未来的发展前景与面临的挑战。
  • 图  1  计算架构的演变示意图

    图  2  基于不同存储介质的计算架构演变图[39]

    图  3  基于SRAM的存内计算单元结构

    图  4  基于DRAM的存内计算基本原理[43]

    图  5  基于ReRAM的存内计算阵列结构与测试芯片

    图  6  基于MRAM的存内计算阵列布局图、显微图和结构[49]

    图  7  基于NOR Flash的存内计算技术原理与相关产品

    图  8  WTM2101的阵列结构与芯片架构

    图  9  WTM2101存内计算芯片8 bit精度运算测试结果

    图  10  搭载WTM2101的耳机产品与WTM2101自动化部署流程

    图  11  WTM2101降噪性能图

    表  1  基于不同存储介质的存内计算芯片性能比较

    标准SRAMDRAMFlashReRAMPCMFeFETMRAM
    非易失性
    多比特存储能力
    面积效率一般
    功耗效率
    工艺微缩性较差较好
    成本较高较低
    技术成熟度测试芯片测试芯片量产产品测试芯片测试芯片器件测试芯片
    下载: 导出CSV

    表  2  神经网络的累计余弦相似度

    神经网络累计余弦相似度
    第0层0.993
    第1层0.996
    第2层0.997
    第3层0.998
    第4层0.998
    第5层0.997
    第6层0.994
    整个神经网络0.994
    下载: 导出CSV

    表  3  WTM2101与市场同类产品的性能比较

    标准市场现有同类产品WTM2101
    算力复杂度(Mops)功耗(mA)算力复杂度(Mops)功耗(mA)
    语音激活检测0.10.10.10.07
    语音唤醒200.64000.4
    40命令词识别3024000.6
    100命令词识别100104000.8
    环境去噪150158001
    声纹识别150015015002
    下载: 导出CSV
  • [1] CHEN C.L.Philip and ZHANG Chunyang Data-intensive applications, challenges, techniques and technologies: A survey on Big Data[J]. Information Sciences, 2014, 275: 314–347. doi: 10.1016/j.ins.2014.01.015
    [2] WULF W A and MCKEE S A. Hitting the memory wall: Implications of the obvious[J]. ACM SIGARCH Computer Architecture News, 1995, 23(1): 20–24. doi: 10.1145/216585.216588
    [3] ZIDAN M A, STRACHAN J P, and LU W D. The future of electronics based on memristive systems[J]. Nature Electronics, 2018, 1(1): 22–29. doi: 10.1038/s41928-017-0006-8
    [4] 张和. 基于MRAM和SRAM的混合器件存算一体芯片设计[D]. [博士论文], 北京航空航天大学, 2021.

    ZHANG He. Computing in memory chip design with hybrid devices based on MRAM and SRAM [D]. [Ph. D. dissertation], Beihang University, 2021.
    [5] NAIR R, ANTAO S F, BERTOLLI C, et al. Active memory cube: A processing-in-memory architecture for exascale systems[J]. IBM Journal of Research and Development, 2015, 59(2/3): 17:1–17:14. doi: 10.1147/JRD.2015.2409732
    [6] AKIN B, FRANCHETTI F, and HOE J C. Data reorganization in memory using 3D-stacked DRAM[J]. ACM SIGARCH Computer Architecture News, 2015, 43(3S): 131–143. doi: 10.1145/2872887.2750397
    [7] FARMAHINI-FARAHANI A, AHN J H, MORROW K, et al. NDA: Near-DRAM acceleration architecture leveraging commodity DRAM devices and standard memory modules[C]. 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA), Burlingame, USA, 2015: 283–295.
    [8] GAO M, AYERS G, and KOZYRAKIS C. Practical near-data processing for in-memory analytics frameworks[C]. 2015 International Conference on Parallel Architecture and Compilation (PACT), San Francisco, USA, 2015: 113–124.
    [9] KAUTZ W H. Cellular logic-in-memory arrays[J]. IEEE Transactions on Computers, 1969, C-18(8): 719–727. doi: 10.1109/T-C.1969.222754
    [10] STONE H S. A logic-in-memory computer[J]. IEEE Transactions on Computers, 1970, C-19(1): 73–78. doi: 10.1109/TC.1970.5008902
    [11] PATTERSON D, ANDERSON T, CARDWELL N, et al. Intelligent RAM (IRAM): Chips that remember and compute[C]. 1997 IEEE International Solids-State Circuits Conference. Digest of Technical Papers, San Francisco, USA, 1997: 224–225.
    [12] KANG Yi, HUANG Wei, YOO S M, et al. FlexRAM: Toward an advanced intelligent memory system[C]. 1999 IEEE International Conference on Computer Design: VLSI in Computers and Processors, Austin, USA, 1999: 192–201.
    [13] LI Shuangchen, NIU Dimin, MALLADI K T, et al. DRISA: A DRAM-based reconfigurable in-situ accelerator[C]. The 50th Annual IEEE/ACM International Symposium on Microarchitecture, Cambridge, UK, 2017: 288–301.
    [14] SKARLATOS D, KIM N S, TORRELLAS J. Pageforge: a near-memory content-aware page-merging architecture[C]//Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture. 2017: 302−314.
    [15] AGRAWAL S R, IDICULA S, RAGHAVAN A, et al. A many-core architecture for in-memory data processing[C]. The 50th Annual IEEE/ACM International Symposium on Microarchitecture, Cambridge, UK, 2017: 245–258.
    [16] SEBASTIAN A, TUMA T, PAPANDREOU N, et al. Temporal correlation detection using computational phase-change memory[J]. Nature Communications, 2017, 8(1): 1115. doi: 10.1038/s41467-017-01481-9
    [17] BISWAS A and CHANDRAKASAN A P. CONV-SRAM: An energy-efficient SRAM with in-memory dot-product computation for low-power convolutional neural networks[J]. IEEE Journal of Solid-State Circuits, 2019, 54(1): 217–230. doi: 10.1109/JSSC.2018.2880918
    [18] LIU Qi, GAO Bin, YAO Peng, et al. 33.2 A fully integrated analog ReRAM based 78.4TOPS/W compute-in-memory chip with fully parallel MAC computing[C]. 2020 IEEE International Solid-State Circuits Conference-(ISSCC), San Francisco, USA, 2020: 500–502.
    [19] ZHU Haozhe, JIAO Bo, ZHANG Jinshan, et al. COMB-MCM: Computing-on-memory-boundary NN processor with bipolar bitwise sparsity optimization for scalable multi-chiplet-module edge machine learning[C]. 2022 IEEE International Solid-State Circuits Conference (ISSCC), San Francisco, USA, 2022: 1–3.
    [20] TAN Fei, WANG Yiming, YANG Yiming, et al. A ReRAM-based computing-in-memory convolutional-macro with customized 2T2R bit-cell for AIoT chip IP applications[J]. IEEE Transactions on Circuits and Systems II:Express Briefs, 2020, 67(9): 1534–1538. doi: 10.1109/TCSII.2020.3013336
    [21] GUO Ruiqi, LIU Yonggang, ZHENG Shixuan, et al. A 5.1pJ/neuron 127.3us/inference RNN-based speech recognition processor using 16 computing-in-memory SRAM macros in 65nm CMOS[C]. 2019 Symposium on VLSI Circuits, Kyoto, Japan, 2019: C120–C121.
    [22] WAN Weier, KUBENDRAN R, GAO Bin, et al. A voltage-mode sensing scheme with differential-row weight mapping for energy-efficient RRAM-based in-memory computing[C]. 2020 IEEE Symposium on VLSI Technology, Honolulu, USA, 2020: 1–2.
    [23] SHEN Wensheng, HUANG Peng, WANG Xiangyu, et al. A novel capacitor-based stateful logic operation scheme for in-memory computing in 1T1RRRAM array[C]. 2020 4th IEEE Electron Devices Technology & Manufacturing Conference (EDTM), Penang, Malaysia, 2020: 1–4.
    [24] YAO Peng, WU Huaqiang, GAO Bin, et al. Fully hardware-implemented memristor convolutional neural network[J]. Nature, 2020, 577(7792): 641–646. doi: 10.1038/s41586-020-1942-4
    [25] MERRIKH-BAYAT F, GUO Xinjie, KLACHKO M, et al. High-performance mixed-signal neurocomputing with nanoscale floating-gate memory cell arrays[J]. IEEE Transactions on Neural Networks and Learning Systems, 2018, 29(10): 4782–4790. doi: 10.1109/TNNLS.2017.2778940
    [26] FICK L, BLAAUW D, SYLVESTER D, et al. Analog in-memory subthreshold deep neural network accelerator[C]. 2017 IEEE Custom Integrated Circuits Conference, Austin, USA, 2017: 1–4.
    [27] MAHMOODI M R and STRUKOV D. An ultra-low energy internally analog, externally digital vector-matrix multiplier based on NOR flash memory technology[C]. The 55th Annual Design Automation Conference, San Francisco, USA, 2018: 33.
    [28] KANG Mingu, GONUGONDLA S K, PATIL A, et al. A multi-functional in-memory inference processor using a standard 6T SRAM array[J]. IEEE Journal of Solid-State Circuits, 2018, 53(2): 642–655. doi: 10.1109/JSSC.2017.2782087
    [29] YANG Jun, KONG Yuyao, WANG Zhen, et al. 24.4 sandwich-RAM: An energy-efficient in-memory BWN architecture with pulse-width modulation[C]. 2019 IEEE International Solid-State Circuits Conference-(ISSCC), San Francisco, USA, 2019: 394–396.
    [30] CHIU Y C, ZHANG Zhixiao, CHEN Jiajing, et al. A 4-Kb 1-to-8-bit configurable 6T SRAM-based computation-in-memory unit-macro for CNN-based AI edge processors[J]. IEEE Journal of Solid-State Circuits, 2020, 55(10): 2790–2801. doi: 10.1109/JSSC.2020.3005754
    [31] JIA Hongyang, VALAVI H, TANG Yinqi, et al. A programmable heterogeneous microprocessor based on bit-scalable in-memory computing[J]. IEEE Journal of Solid-State Circuits, 2020, 55(9): 2609–2621. doi: 10.1109/JSSC.2020.2987714
    [32] JIANG Zhewei, YIN Shihui, SEO J S, et al. C3SRAM: An in-memory-computing SRAM macro based on robust capacitive coupling computing mechanism[J]. IEEE Journal of Solid-State Circuits, 2020, 55(7): 1888–1897. doi: 10.1109/JSSC.2020.2992886
    [33] YIN Shihui, JIANG Zhewei, SEO J S, et al. XNOR-SRAM: In-memory computing SRAM macro for binary/ternary deep neural networks[J]. IEEE Journal of Solid-State Circuits, 2020, 55(6): 1733–1743. doi: 10.1109/JSSC.2019.2963616
    [34] YAN Bonan, YANG Qing, CHEN Weihao, et al. RRAM-based spiking nonvolatile computing-in-memory processing engine with precision-configurable in situ nonlinear activation[C]. 2019 Symposium on VLSI Technology, Kyoto, Japan, 2019: T86-T87.
    [35] XUE Chengxin, CHEN Weihao, LIU J S, et al. 24.1 A 1Mb multibit RRAM computing-in-memory macro with 14.6ns parallel MAC computing time for CNN based AI edge processors[C]. 2019 IEEE International Solid-State Circuits Conference-(ISSCC), San Francisco, USA, 2019: 388–390.
    [36] XUE Chengxin, CHEN Weihao, LIU J S, et al. Embedded 1-Mb ReRAM-based computing-in-memory macro with multibit input and weight for CNN-based AI edge processors[J]. IEEE Journal of Solid-State Circuits, 2020, 55(1): 203–215. doi: 10.1109/JSSC.2019.2951363
    [37] ZHA Yue, NOWAK E, and LI Jing. Liquid silicon: A nonvolatile fully programmable processing-in-memory processor with monolithically integrated ReRAM[J]. IEEE Journal of Solid-State Circuits, 2020, 55(4): 908–919. doi: 10.1109/JSSC.2019.2963005
    [38] ZHANG H, LIU J, BAI J, et al. HD-CIM: Hybrid-Device Computing-In-Memory Structure Based on MRAM and SRAM to Reduce Weight Loading Energy of Neural Networks[J]. IEEE Transactions on Circuits and Systems I: Regular Papers, 2022, 69(11): 4465–4474.
    [39] SEBASTIAN A, LE GALLO M, KHADDAM-ALJAMEH R, et al. Memory devices and applications for in-memory computing[J]. Nature Nanotechnology, 2020, 15(7): 529–544. doi: 10.1038/s41565-020-0655-z
    [40] DONG Qing, SINANGIL M E, ERBAGCI B, et al. 15.3 A 351TOPS/W and 372.4GOPS compute-in-memory SRAM macro in 7nm FinFET CMOS for machine-learning applications[C]. 2020 IEEE International Solid-State Circuits Conference-(ISSCC), San Francisco, USA, 2020: 242–244.
    [41] CHIH Y D, LEE P H, FUJIWARA H, et al. 16.4 an 89TOPS/W and 16.3TOPS/mm2 all-digital SRAM-based full-precision compute-in memory macro in 22nm for machine-learning edge applications[C]. 2021 IEEE International Solid-State Circuits Conference (ISSCC), San Francisco, USA, 2021: 252–254.
    [42] YAN Bonan, HSU J L, YU Pangcheng, et al. A 1.041-Mb/mm2 27.38-TOPS/W Signed-INT8 dynamic-logic-based ADC-less SRAM compute-in-Memory Macro in 28nm with reconfigurable bitwise operation for AI and embedded applications[C]. 2022 IEEE International Solid-State Circuits Conference (ISSCC), San Francisco, USA, 2022: 188–190.
    [43] SESHADRI V, LEE D, MULLINS T, et al. Ambit: In-memory accelerator for bulk bitwise operations using commodity DRAM technology[C]. The 50th Annual IEEE/ACM International Symposium on Microarchitecture, Cambridge, UK, 2017: 273–287.
    [44] NATSUI M, SUZUKI D, SAKIMURA N, et al. Nonvolatile logic-in-memory LSI using cycle-based power gating and its application to motion-vector prediction[J]. IEEE Journal of Solid-State Circuits, 2015, 50(2): 476–489. doi: 10.1109/JSSC.2014.2362853
    [45] ZHANG He, KANG Wang, CAO Kaihua, et al. Spintronic processing unit in spin transfer torque magnetic random access memory[J]. IEEE Transactions on Electron Devices, 2019, 66(4): 2017–2022. doi: 10.1109/TED.2019.2898391
    [46] ZHANG He, KANG Wang, WANG Lezhi, et al. Stateful reconfigurable logic via a single-voltage-gated spin hall-effect driven magnetic tunnel junction in a spintronic memory[J]. IEEE Transactions on Electron Devices, 2017, 64(10): 4295–4301. doi: 10.1109/TED.2017.2726544
    [47] WANG Haotian, KANG Wang, PAN Biao, et al. Spintronic computing-in-memory architecture based on voltage-controlled spin–orbit torque devices for binary neural networks[J]. IEEE Transactions on Electron Devices, 2021, 68(10): 4944–4950. doi: 10.1109/TED.2021.3102896
    [48] DEAVILLE P, ZHANG Bonan, CHEN L Y, et al. A maximally row-parallel MRAM in-memory-computing macro addressing readout circuitsensitivity and area[C]. ESSCIRC 2021-IEEE 47th European Solid State Circuits Conference (ESSCIRC), Grenoble, France, 2021: 75–78.
    [49] JUNG S, LEE H, MYUNG S, et al. A crossbar array of magnetoresistive memory devices for in-memory computing[J]. Nature, 2022, 601(7892): 211–216. doi: 10.1038/s41586-021-04196-6
    [50] SONG K M, JEONG J S, PAN Biao, et al. Skyrmion-based artificial synapses for neuromorphic computing[J]. Nature Electronics, 2020, 3(3): 148–155. doi: 10.1038/s41928-020-0385-0
    [51] CHEN Chao, LIN Tao, NIU Jianteng, et al. Surface acoustic wave controlled skyrmion-based synapse devices[J]. Nanotechnology, 2022, 33(11): 115205. doi: 10.1088/1361-6528/ac3f14
    [52] HUANG Yangqi, KANG Wang, ZHANG Xichao, et al. Magnetic skyrmion-based synaptic devices[J]. Nanotechnology, 2017, 28(8): 08LT02. doi: 10.1088/1361-6528/aa5838
    [53] HU Hanwen, WANG Weichen, CHEN C K, et al. A 512 Gb in-memory-computing 3D-NAND flash supporting similar-vector-matching operations on edge-AI devices[C]. 2022 IEEE International Solid-State Circuits Conference (ISSCC), San Francisco, USA, 2022: 138–140.
    [54] KIM M, LIU Muqing, EVERSON L R, et al. An embedded NAND flash-based compute-in-memory array demonstrated in a standard logic process[J]. IEEE Journal of Solid-State Circuits, 2022, 57(2): 625–638. doi: 10.1109/JSSC.2021.3098671
  • 加载中
图(11) / 表(3)
计量
  • 文章访问数:  2033
  • HTML全文浏览量:  1882
  • PDF下载量:  551
  • 被引次数: 0
出版历程
  • 收稿日期:  2022-04-08
  • 修回日期:  2022-10-09
  • 网络出版日期:  2022-10-20
  • 刊出日期:  2023-05-10

目录

    /

    返回文章
    返回