Technology Developments and Applications of In-memory Computing Processors
-
摘要: 随着数据快速增长,冯诺依曼架构内存墙成为计算性能进一步提升的关键瓶颈。新型存算一体架构(包括存内计算(IMC)架构与近存计算(NMC)架构),有望打破冯诺依曼架构瓶颈,大幅提高算力和能效。该文介绍了存算一体芯片的发展历程、研究现状以及基于各类存储器介质(如传统存储器DRAM, SRAM和Flash和新型非易失性存储器ReRAM, PCM, MRAM, FeFET等)的存内计算基本原理、优势与面临的问题。然后,以知存科技WTM2101量产芯片为例,重点介绍了存算一体芯片的电路结构与应用现状。最后,分析了存算一体芯片未来的发展前景与面临的挑战。Abstract: Memory wall has become one of the key challenges in Von Neumann architecture, memory-centric computing architectures, such as In-Memory Computing (IMC) and Near-Memory Computing (NMC) are expected to break the Von-Neumann bottleneck, improving computing performance and energy efficiency. The progress of memory-centric computing technology, as well as the principles, advantages and problems based on a variety of memory media, such as traditional memories (e.g., DRAM, SRAM and Flash) and emerging non-volatile memories (e.g., ReRAM, PCM, MRAM and FeFET) are introduced in this paper. Then, the circuit structure and main applications with IMC chips are highlighted, taking Witmem's product WTM2101 as an example. Finally, the future development prospects and challenges of the all-in-one chip are also analysed.
-
图 2 基于不同存储介质的计算架构演变图[39]
图 4 基于DRAM的存内计算基本原理[43]
图 6 基于MRAM的存内计算阵列布局图、显微图和结构[49]
表 1 基于不同存储介质的存内计算芯片性能比较
标准 SRAM DRAM Flash ReRAM PCM FeFET MRAM 非易失性 否 否 是 是 是 是 是 多比特存储能力 否 否 是 是 是 是 否 面积效率 低 一般 高 高 高 高 高 功耗效率 低 低 高 高 高 高 高 工艺微缩性 好 好 较差 好 较好 好 好 成本 高 较高 低 低 较低 低 低 技术成熟度 测试芯片 测试芯片 量产产品 测试芯片 测试芯片 器件 测试芯片 表 2 神经网络的累计余弦相似度
神经网络 累计余弦相似度 第0层 0.993 第1层 0.996 第2层 0.997 第3层 0.998 第4层 0.998 第5层 0.997 第6层 0.994 整个神经网络 0.994 表 3 WTM2101与市场同类产品的性能比较
标准 市场现有同类产品 WTM2101 算力复杂度 (Mops) 功耗(mA) 算力复杂度 (Mops) 功耗(mA) 语音激活检测 0.1 0.1 0.1 0.07 语音唤醒 20 0.6 400 0.4 40命令词识别 30 2 400 0.6 100命令词识别 100 10 400 0.8 环境去噪 150 15 800 1 声纹识别 1500 150 1500 2 -
[1] CHEN C.L.Philip and ZHANG Chunyang Data-intensive applications, challenges, techniques and technologies: A survey on Big Data[J]. Information Sciences, 2014, 275: 314–347. doi: 10.1016/j.ins.2014.01.015 [2] WULF W A and MCKEE S A. Hitting the memory wall: Implications of the obvious[J]. ACM SIGARCH Computer Architecture News, 1995, 23(1): 20–24. doi: 10.1145/216585.216588 [3] ZIDAN M A, STRACHAN J P, and LU W D. The future of electronics based on memristive systems[J]. Nature Electronics, 2018, 1(1): 22–29. doi: 10.1038/s41928-017-0006-8 [4] 张和. 基于MRAM和SRAM的混合器件存算一体芯片设计[D]. [博士论文], 北京航空航天大学, 2021.ZHANG He. Computing in memory chip design with hybrid devices based on MRAM and SRAM [D]. [Ph. D. dissertation], Beihang University, 2021. [5] NAIR R, ANTAO S F, BERTOLLI C, et al. Active memory cube: A processing-in-memory architecture for exascale systems[J]. IBM Journal of Research and Development, 2015, 59(2/3): 17:1–17:14. doi: 10.1147/JRD.2015.2409732 [6] AKIN B, FRANCHETTI F, and HOE J C. Data reorganization in memory using 3D-stacked DRAM[J]. ACM SIGARCH Computer Architecture News, 2015, 43(3S): 131–143. doi: 10.1145/2872887.2750397 [7] FARMAHINI-FARAHANI A, AHN J H, MORROW K, et al. NDA: Near-DRAM acceleration architecture leveraging commodity DRAM devices and standard memory modules[C]. 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA), Burlingame, USA, 2015: 283–295. [8] GAO M, AYERS G, and KOZYRAKIS C. Practical near-data processing for in-memory analytics frameworks[C]. 2015 International Conference on Parallel Architecture and Compilation (PACT), San Francisco, USA, 2015: 113–124. [9] KAUTZ W H. Cellular logic-in-memory arrays[J]. IEEE Transactions on Computers, 1969, C-18(8): 719–727. doi: 10.1109/T-C.1969.222754 [10] STONE H S. A logic-in-memory computer[J]. IEEE Transactions on Computers, 1970, C-19(1): 73–78. doi: 10.1109/TC.1970.5008902 [11] PATTERSON D, ANDERSON T, CARDWELL N, et al. Intelligent RAM (IRAM): Chips that remember and compute[C]. 1997 IEEE International Solids-State Circuits Conference. Digest of Technical Papers, San Francisco, USA, 1997: 224–225. [12] KANG Yi, HUANG Wei, YOO S M, et al. FlexRAM: Toward an advanced intelligent memory system[C]. 1999 IEEE International Conference on Computer Design: VLSI in Computers and Processors, Austin, USA, 1999: 192–201. [13] LI Shuangchen, NIU Dimin, MALLADI K T, et al. DRISA: A DRAM-based reconfigurable in-situ accelerator[C]. The 50th Annual IEEE/ACM International Symposium on Microarchitecture, Cambridge, UK, 2017: 288–301. [14] SKARLATOS D, KIM N S, TORRELLAS J. Pageforge: a near-memory content-aware page-merging architecture[C]//Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture. 2017: 302−314. [15] AGRAWAL S R, IDICULA S, RAGHAVAN A, et al. A many-core architecture for in-memory data processing[C]. The 50th Annual IEEE/ACM International Symposium on Microarchitecture, Cambridge, UK, 2017: 245–258. [16] SEBASTIAN A, TUMA T, PAPANDREOU N, et al. Temporal correlation detection using computational phase-change memory[J]. Nature Communications, 2017, 8(1): 1115. doi: 10.1038/s41467-017-01481-9 [17] BISWAS A and CHANDRAKASAN A P. CONV-SRAM: An energy-efficient SRAM with in-memory dot-product computation for low-power convolutional neural networks[J]. IEEE Journal of Solid-State Circuits, 2019, 54(1): 217–230. doi: 10.1109/JSSC.2018.2880918 [18] LIU Qi, GAO Bin, YAO Peng, et al. 33.2 A fully integrated analog ReRAM based 78.4TOPS/W compute-in-memory chip with fully parallel MAC computing[C]. 2020 IEEE International Solid-State Circuits Conference-(ISSCC), San Francisco, USA, 2020: 500–502. [19] ZHU Haozhe, JIAO Bo, ZHANG Jinshan, et al. COMB-MCM: Computing-on-memory-boundary NN processor with bipolar bitwise sparsity optimization for scalable multi-chiplet-module edge machine learning[C]. 2022 IEEE International Solid-State Circuits Conference (ISSCC), San Francisco, USA, 2022: 1–3. [20] TAN Fei, WANG Yiming, YANG Yiming, et al. A ReRAM-based computing-in-memory convolutional-macro with customized 2T2R bit-cell for AIoT chip IP applications[J]. IEEE Transactions on Circuits and Systems II:Express Briefs, 2020, 67(9): 1534–1538. doi: 10.1109/TCSII.2020.3013336 [21] GUO Ruiqi, LIU Yonggang, ZHENG Shixuan, et al. A 5.1pJ/neuron 127.3us/inference RNN-based speech recognition processor using 16 computing-in-memory SRAM macros in 65nm CMOS[C]. 2019 Symposium on VLSI Circuits, Kyoto, Japan, 2019: C120–C121. [22] WAN Weier, KUBENDRAN R, GAO Bin, et al. A voltage-mode sensing scheme with differential-row weight mapping for energy-efficient RRAM-based in-memory computing[C]. 2020 IEEE Symposium on VLSI Technology, Honolulu, USA, 2020: 1–2. [23] SHEN Wensheng, HUANG Peng, WANG Xiangyu, et al. A novel capacitor-based stateful logic operation scheme for in-memory computing in 1T1RRRAM array[C]. 2020 4th IEEE Electron Devices Technology & Manufacturing Conference (EDTM), Penang, Malaysia, 2020: 1–4. [24] YAO Peng, WU Huaqiang, GAO Bin, et al. Fully hardware-implemented memristor convolutional neural network[J]. Nature, 2020, 577(7792): 641–646. doi: 10.1038/s41586-020-1942-4 [25] MERRIKH-BAYAT F, GUO Xinjie, KLACHKO M, et al. High-performance mixed-signal neurocomputing with nanoscale floating-gate memory cell arrays[J]. IEEE Transactions on Neural Networks and Learning Systems, 2018, 29(10): 4782–4790. doi: 10.1109/TNNLS.2017.2778940 [26] FICK L, BLAAUW D, SYLVESTER D, et al. Analog in-memory subthreshold deep neural network accelerator[C]. 2017 IEEE Custom Integrated Circuits Conference, Austin, USA, 2017: 1–4. [27] MAHMOODI M R and STRUKOV D. An ultra-low energy internally analog, externally digital vector-matrix multiplier based on NOR flash memory technology[C]. The 55th Annual Design Automation Conference, San Francisco, USA, 2018: 33. [28] KANG Mingu, GONUGONDLA S K, PATIL A, et al. A multi-functional in-memory inference processor using a standard 6T SRAM array[J]. IEEE Journal of Solid-State Circuits, 2018, 53(2): 642–655. doi: 10.1109/JSSC.2017.2782087 [29] YANG Jun, KONG Yuyao, WANG Zhen, et al. 24.4 sandwich-RAM: An energy-efficient in-memory BWN architecture with pulse-width modulation[C]. 2019 IEEE International Solid-State Circuits Conference-(ISSCC), San Francisco, USA, 2019: 394–396. [30] CHIU Y C, ZHANG Zhixiao, CHEN Jiajing, et al. A 4-Kb 1-to-8-bit configurable 6T SRAM-based computation-in-memory unit-macro for CNN-based AI edge processors[J]. IEEE Journal of Solid-State Circuits, 2020, 55(10): 2790–2801. doi: 10.1109/JSSC.2020.3005754 [31] JIA Hongyang, VALAVI H, TANG Yinqi, et al. A programmable heterogeneous microprocessor based on bit-scalable in-memory computing[J]. IEEE Journal of Solid-State Circuits, 2020, 55(9): 2609–2621. doi: 10.1109/JSSC.2020.2987714 [32] JIANG Zhewei, YIN Shihui, SEO J S, et al. C3SRAM: An in-memory-computing SRAM macro based on robust capacitive coupling computing mechanism[J]. IEEE Journal of Solid-State Circuits, 2020, 55(7): 1888–1897. doi: 10.1109/JSSC.2020.2992886 [33] YIN Shihui, JIANG Zhewei, SEO J S, et al. XNOR-SRAM: In-memory computing SRAM macro for binary/ternary deep neural networks[J]. IEEE Journal of Solid-State Circuits, 2020, 55(6): 1733–1743. doi: 10.1109/JSSC.2019.2963616 [34] YAN Bonan, YANG Qing, CHEN Weihao, et al. RRAM-based spiking nonvolatile computing-in-memory processing engine with precision-configurable in situ nonlinear activation[C]. 2019 Symposium on VLSI Technology, Kyoto, Japan, 2019: T86-T87. [35] XUE Chengxin, CHEN Weihao, LIU J S, et al. 24.1 A 1Mb multibit RRAM computing-in-memory macro with 14.6ns parallel MAC computing time for CNN based AI edge processors[C]. 2019 IEEE International Solid-State Circuits Conference-(ISSCC), San Francisco, USA, 2019: 388–390. [36] XUE Chengxin, CHEN Weihao, LIU J S, et al. Embedded 1-Mb ReRAM-based computing-in-memory macro with multibit input and weight for CNN-based AI edge processors[J]. IEEE Journal of Solid-State Circuits, 2020, 55(1): 203–215. doi: 10.1109/JSSC.2019.2951363 [37] ZHA Yue, NOWAK E, and LI Jing. Liquid silicon: A nonvolatile fully programmable processing-in-memory processor with monolithically integrated ReRAM[J]. IEEE Journal of Solid-State Circuits, 2020, 55(4): 908–919. doi: 10.1109/JSSC.2019.2963005 [38] ZHANG H, LIU J, BAI J, et al. HD-CIM: Hybrid-Device Computing-In-Memory Structure Based on MRAM and SRAM to Reduce Weight Loading Energy of Neural Networks[J]. IEEE Transactions on Circuits and Systems I: Regular Papers, 2022, 69(11): 4465–4474. [39] SEBASTIAN A, LE GALLO M, KHADDAM-ALJAMEH R, et al. Memory devices and applications for in-memory computing[J]. Nature Nanotechnology, 2020, 15(7): 529–544. doi: 10.1038/s41565-020-0655-z [40] DONG Qing, SINANGIL M E, ERBAGCI B, et al. 15.3 A 351TOPS/W and 372.4GOPS compute-in-memory SRAM macro in 7nm FinFET CMOS for machine-learning applications[C]. 2020 IEEE International Solid-State Circuits Conference-(ISSCC), San Francisco, USA, 2020: 242–244. [41] CHIH Y D, LEE P H, FUJIWARA H, et al. 16.4 an 89TOPS/W and 16.3TOPS/mm2 all-digital SRAM-based full-precision compute-in memory macro in 22nm for machine-learning edge applications[C]. 2021 IEEE International Solid-State Circuits Conference (ISSCC), San Francisco, USA, 2021: 252–254. [42] YAN Bonan, HSU J L, YU Pangcheng, et al. A 1.041-Mb/mm2 27.38-TOPS/W Signed-INT8 dynamic-logic-based ADC-less SRAM compute-in-Memory Macro in 28nm with reconfigurable bitwise operation for AI and embedded applications[C]. 2022 IEEE International Solid-State Circuits Conference (ISSCC), San Francisco, USA, 2022: 188–190. [43] SESHADRI V, LEE D, MULLINS T, et al. Ambit: In-memory accelerator for bulk bitwise operations using commodity DRAM technology[C]. The 50th Annual IEEE/ACM International Symposium on Microarchitecture, Cambridge, UK, 2017: 273–287. [44] NATSUI M, SUZUKI D, SAKIMURA N, et al. Nonvolatile logic-in-memory LSI using cycle-based power gating and its application to motion-vector prediction[J]. IEEE Journal of Solid-State Circuits, 2015, 50(2): 476–489. doi: 10.1109/JSSC.2014.2362853 [45] ZHANG He, KANG Wang, CAO Kaihua, et al. Spintronic processing unit in spin transfer torque magnetic random access memory[J]. IEEE Transactions on Electron Devices, 2019, 66(4): 2017–2022. doi: 10.1109/TED.2019.2898391 [46] ZHANG He, KANG Wang, WANG Lezhi, et al. Stateful reconfigurable logic via a single-voltage-gated spin hall-effect driven magnetic tunnel junction in a spintronic memory[J]. IEEE Transactions on Electron Devices, 2017, 64(10): 4295–4301. doi: 10.1109/TED.2017.2726544 [47] WANG Haotian, KANG Wang, PAN Biao, et al. Spintronic computing-in-memory architecture based on voltage-controlled spin–orbit torque devices for binary neural networks[J]. IEEE Transactions on Electron Devices, 2021, 68(10): 4944–4950. doi: 10.1109/TED.2021.3102896 [48] DEAVILLE P, ZHANG Bonan, CHEN L Y, et al. A maximally row-parallel MRAM in-memory-computing macro addressing readout circuitsensitivity and area[C]. ESSCIRC 2021-IEEE 47th European Solid State Circuits Conference (ESSCIRC), Grenoble, France, 2021: 75–78. [49] JUNG S, LEE H, MYUNG S, et al. A crossbar array of magnetoresistive memory devices for in-memory computing[J]. Nature, 2022, 601(7892): 211–216. doi: 10.1038/s41586-021-04196-6 [50] SONG K M, JEONG J S, PAN Biao, et al. Skyrmion-based artificial synapses for neuromorphic computing[J]. Nature Electronics, 2020, 3(3): 148–155. doi: 10.1038/s41928-020-0385-0 [51] CHEN Chao, LIN Tao, NIU Jianteng, et al. Surface acoustic wave controlled skyrmion-based synapse devices[J]. Nanotechnology, 2022, 33(11): 115205. doi: 10.1088/1361-6528/ac3f14 [52] HUANG Yangqi, KANG Wang, ZHANG Xichao, et al. Magnetic skyrmion-based synaptic devices[J]. Nanotechnology, 2017, 28(8): 08LT02. doi: 10.1088/1361-6528/aa5838 [53] HU Hanwen, WANG Weichen, CHEN C K, et al. A 512 Gb in-memory-computing 3D-NAND flash supporting similar-vector-matching operations on edge-AI devices[C]. 2022 IEEE International Solid-State Circuits Conference (ISSCC), San Francisco, USA, 2022: 138–140. [54] KIM M, LIU Muqing, EVERSON L R, et al. An embedded NAND flash-based compute-in-memory array demonstrated in a standard logic process[J]. IEEE Journal of Solid-State Circuits, 2022, 57(2): 625–638. doi: 10.1109/JSSC.2021.3098671