存内计算芯片研究进展及应用

郭昕婕; 王光燿; 王绍迪

doi:10.11999/JEIT220420

存内计算芯片研究进展及应用

doi: 10.11999/JEIT220420 cstr: 32379.14.JEIT220420

1.
北京知存科技有限公司北京 100191
2.
北京航空航天大学集成电路科学与工程学院北京 100191

基金项目: 科技部“科技助力经济2020”重点专项项目(SQ2020YFF0404823)

详细信息

作者简介:
郭昕婕：女，博士，研究方向为存算一体芯片设计

王光燿：男，硕士生，研究方向为存算一体芯片设计

王绍迪：男，博士，研究方向为存储器及存算一体架构设计

通讯作者:
王绍迪　shaodi.wang@witintech.com

中图分类号: TN4
计量
- 文章访问数: 3684
- HTML全文浏览量: 3117
- PDF下载量: 759
- 被引次数: 0
出版历程
- 收稿日期: 2022-04-08
- 修回日期: 2022-10-09
- 网络出版日期: 2022-10-20
- 刊出日期: 2023-05-10

Technology Developments and Applications of In-memory Computing Processors

1.
Beijiing Zhicun (Witmem) Technology Corporation Limited, Beijing 100191, China
2.
School of Integrated Circuit Science and Engineering, Beihang University, Beijing 100191, China

Funds: The Ministry of Science and Technology's Key Special Project (SQ2020YFF0404823)

摘要

摘要: 随着数据快速增长，冯诺依曼架构内存墙成为计算性能进一步提升的关键瓶颈。新型存算一体架构(包括存内计算(IMC)架构与近存计算(NMC)架构)，有望打破冯诺依曼架构瓶颈，大幅提高算力和能效。该文介绍了存算一体芯片的发展历程、研究现状以及基于各类存储器介质(如传统存储器DRAM, SRAM和Flash和新型非易失性存储器ReRAM, PCM, MRAM, FeFET等)的存内计算基本原理、优势与面临的问题。然后，以知存科技WTM2101量产芯片为例，重点介绍了存算一体芯片的电路结构与应用现状。最后，分析了存算一体芯片未来的发展前景与面临的挑战。
- 存算一体 /
- 存储墙 /
- 功耗墙 /
- 存内计算 /
- 近存计算 /
- 冯诺依曼架构瓶颈
Abstract: Memory wall has become one of the key challenges in Von Neumann architecture, memory-centric computing architectures, such as In-Memory Computing (IMC) and Near-Memory Computing (NMC) are expected to break the Von-Neumann bottleneck, improving computing performance and energy efficiency. The progress of memory-centric computing technology, as well as the principles, advantages and problems based on a variety of memory media, such as traditional memories (e.g., DRAM, SRAM and Flash) and emerging non-volatile memories (e.g., ReRAM, PCM, MRAM and FeFET) are introduced in this paper. Then, the circuit structure and main applications with IMC chips are highlighted, taking Witmem's product WTM2101 as an example. Finally, the future development prospects and challenges of the all-in-one chip are also analysed.
- Memory-centric computing /
- Memory wall /
- Power wall /
- In-Memory Computing (IMC) /
- Near-Memory Computing (NMC) /
- Von-Neumann bottleneck

HTML全文

图 1 计算架构的演变示意图

下载: 全尺寸图片幻灯片

图 2 基于不同存储介质的计算架构演变图^[39]

下载: 全尺寸图片幻灯片

图 3 基于SRAM的存内计算单元结构

下载: 全尺寸图片幻灯片

图 4 基于DRAM的存内计算基本原理^[43]

下载: 全尺寸图片幻灯片

图 5 基于ReRAM的存内计算阵列结构与测试芯片

下载: 全尺寸图片幻灯片

图 6 基于MRAM的存内计算阵列布局图、显微图和结构^[49]

下载: 全尺寸图片幻灯片

图 7 基于NOR Flash的存内计算技术原理与相关产品

下载: 全尺寸图片幻灯片

图 8 WTM2101的阵列结构与芯片架构

下载: 全尺寸图片幻灯片

图 9 WTM2101存内计算芯片8 bit精度运算测试结果

下载: 全尺寸图片幻灯片

图 10 搭载WTM2101的耳机产品与WTM2101自动化部署流程

下载: 全尺寸图片幻灯片

图 11 WTM2101降噪性能图

下载: 全尺寸图片幻灯片

表 1 基于不同存储介质的存内计算芯片性能比较

标准	SRAM	DRAM	Flash	ReRAM	PCM	FeFET	MRAM
非易失性	否	否	是	是	是	是	是
多比特存储能力	否	否	是	是	是	是	否
面积效率	低	一般	高	高	高	高	高
功耗效率	低	低	高	高	高	高	高
工艺微缩性	好	好	较差	好	较好	好	好
成本	高	较高	低	低	较低	低	低
技术成熟度	测试芯片	测试芯片	量产产品	测试芯片	测试芯片	器件	测试芯片

下载: 导出CSV

表 2 神经网络的累计余弦相似度

神经网络	累计余弦相似度
第0层	0.993
第1层	0.996
第2层	0.997
第3层	0.998
第4层	0.998
第5层	0.997
第6层	0.994
整个神经网络	0.994

下载: 导出CSV

表 3 WTM2101与市场同类产品的性能比较

标准	市场现有同类产品		WTM2101
标准	算力复杂度(Mops)	功耗(mA)	算力复杂度(Mops)	功耗(mA)
语音激活检测	0.1	0.1	0.1	0.07
语音唤醒	20	0.6	400	0.4
40命令词识别	30	2	400	0.6
100命令词识别	100	10	400	0.8
环境去噪	150	15	800	1
声纹识别	1500	150	1500	2

下载: 导出CSV

参考文献(54)

[1]	CHEN C.L.Philip and ZHANG Chunyang Data-intensive applications, challenges, techniques and technologies: A survey on Big Data[J]. Information Sciences, 2014, 275: 314–347. doi: 10.1016/j.ins.2014.01.015
[2]	WULF W A and MCKEE S A. Hitting the memory wall: Implications of the obvious[J]. ACM SIGARCH Computer Architecture News, 1995, 23(1): 20–24. doi: 10.1145/216585.216588
[3]	ZIDAN M A, STRACHAN J P, and LU W D. The future of electronics based on memristive systems[J]. Nature Electronics, 2018, 1(1): 22–29. doi: 10.1038/s41928-017-0006-8
[4]	张和. 基于MRAM和SRAM的混合器件存算一体芯片设计[D]. [博士论文], 北京航空航天大学, 2021. ZHANG He. Computing in memory chip design with hybrid devices based on MRAM and SRAM [D]. [Ph. D. dissertation], Beihang University, 2021.
[5]	NAIR R, ANTAO S F, BERTOLLI C, et al. Active memory cube: A processing-in-memory architecture for exascale systems[J]. IBM Journal of Research and Development, 2015, 59(2/3): 17:1–17:14. doi: 10.1147/JRD.2015.2409732
[6]	AKIN B, FRANCHETTI F, and HOE J C. Data reorganization in memory using 3D-stacked DRAM[J]. ACM SIGARCH Computer Architecture News, 2015, 43(3S): 131–143. doi: 10.1145/2872887.2750397
[7]	FARMAHINI-FARAHANI A, AHN J H, MORROW K, et al. NDA: Near-DRAM acceleration architecture leveraging commodity DRAM devices and standard memory modules[C]. 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA), Burlingame, USA, 2015: 283–295.
[8]	GAO M, AYERS G, and KOZYRAKIS C. Practical near-data processing for in-memory analytics frameworks[C]. 2015 International Conference on Parallel Architecture and Compilation (PACT), San Francisco, USA, 2015: 113–124.
[9]	KAUTZ W H. Cellular logic-in-memory arrays[J]. IEEE Transactions on Computers, 1969, C-18(8): 719–727. doi: 10.1109/T-C.1969.222754
[10]	STONE H S. A logic-in-memory computer[J]. IEEE Transactions on Computers, 1970, C-19(1): 73–78. doi: 10.1109/TC.1970.5008902
[11]	PATTERSON D, ANDERSON T, CARDWELL N, et al. Intelligent RAM (IRAM): Chips that remember and compute[C]. 1997 IEEE International Solids-State Circuits Conference. Digest of Technical Papers, San Francisco, USA, 1997: 224–225.
[12]	KANG Yi, HUANG Wei, YOO S M, et al. FlexRAM: Toward an advanced intelligent memory system[C]. 1999 IEEE International Conference on Computer Design: VLSI in Computers and Processors, Austin, USA, 1999: 192–201.
[13]	LI Shuangchen, NIU Dimin, MALLADI K T, et al. DRISA: A DRAM-based reconfigurable in-situ accelerator[C]. The 50th Annual IEEE/ACM International Symposium on Microarchitecture, Cambridge, UK, 2017: 288–301.
[14]	SKARLATOS D, KIM N S, TORRELLAS J. Pageforge: a near-memory content-aware page-merging architecture[C]//Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture. 2017: 302−314.
[15]	AGRAWAL S R, IDICULA S, RAGHAVAN A, et al. A many-core architecture for in-memory data processing[C]. The 50th Annual IEEE/ACM International Symposium on Microarchitecture, Cambridge, UK, 2017: 245–258.
[16]	SEBASTIAN A, TUMA T, PAPANDREOU N, et al. Temporal correlation detection using computational phase-change memory[J]. Nature Communications, 2017, 8(1): 1115. doi: 10.1038/s41467-017-01481-9
[17]	BISWAS A and CHANDRAKASAN A P. CONV-SRAM: An energy-efficient SRAM with in-memory dot-product computation for low-power convolutional neural networks[J]. IEEE Journal of Solid-State Circuits, 2019, 54(1): 217–230. doi: 10.1109/JSSC.2018.2880918
[18]	LIU Qi, GAO Bin, YAO Peng, et al. 33.2 A fully integrated analog ReRAM based 78.4TOPS/W compute-in-memory chip with fully parallel MAC computing[C]. 2020 IEEE International Solid-State Circuits Conference-(ISSCC), San Francisco, USA, 2020: 500–502.
[19]	ZHU Haozhe, JIAO Bo, ZHANG Jinshan, et al. COMB-MCM: Computing-on-memory-boundary NN processor with bipolar bitwise sparsity optimization for scalable multi-chiplet-module edge machine learning[C]. 2022 IEEE International Solid-State Circuits Conference (ISSCC), San Francisco, USA, 2022: 1–3.
[20]	TAN Fei, WANG Yiming, YANG Yiming, et al. A ReRAM-based computing-in-memory convolutional-macro with customized 2T2R bit-cell for AIoT chip IP applications[J]. IEEE Transactions on Circuits and Systems II:Express Briefs, 2020, 67(9): 1534–1538. doi: 10.1109/TCSII.2020.3013336
[21]	GUO Ruiqi, LIU Yonggang, ZHENG Shixuan, et al. A 5.1pJ/neuron 127.3us/inference RNN-based speech recognition processor using 16 computing-in-memory SRAM macros in 65nm CMOS[C]. 2019 Symposium on VLSI Circuits, Kyoto, Japan, 2019: C120–C121.
[22]	WAN Weier, KUBENDRAN R, GAO Bin, et al. A voltage-mode sensing scheme with differential-row weight mapping for energy-efficient RRAM-based in-memory computing[C]. 2020 IEEE Symposium on VLSI Technology, Honolulu, USA, 2020: 1–2.
[23]	SHEN Wensheng, HUANG Peng, WANG Xiangyu, et al. A novel capacitor-based stateful logic operation scheme for in-memory computing in 1T1RRRAM array[C]. 2020 4th IEEE Electron Devices Technology & Manufacturing Conference (EDTM), Penang, Malaysia, 2020: 1–4.
[24]	YAO Peng, WU Huaqiang, GAO Bin, et al. Fully hardware-implemented memristor convolutional neural network[J]. Nature, 2020, 577(7792): 641–646. doi: 10.1038/s41586-020-1942-4
[25]	MERRIKH-BAYAT F, GUO Xinjie, KLACHKO M, et al. High-performance mixed-signal neurocomputing with nanoscale floating-gate memory cell arrays[J]. IEEE Transactions on Neural Networks and Learning Systems, 2018, 29(10): 4782–4790. doi: 10.1109/TNNLS.2017.2778940
[26]	FICK L, BLAAUW D, SYLVESTER D, et al. Analog in-memory subthreshold deep neural network accelerator[C]. 2017 IEEE Custom Integrated Circuits Conference, Austin, USA, 2017: 1–4.
[27]	MAHMOODI M R and STRUKOV D. An ultra-low energy internally analog, externally digital vector-matrix multiplier based on NOR flash memory technology[C]. The 55th Annual Design Automation Conference, San Francisco, USA, 2018: 33.
[28]	KANG Mingu, GONUGONDLA S K, PATIL A, et al. A multi-functional in-memory inference processor using a standard 6T SRAM array[J]. IEEE Journal of Solid-State Circuits, 2018, 53(2): 642–655. doi: 10.1109/JSSC.2017.2782087
[29]	YANG Jun, KONG Yuyao, WANG Zhen, et al. 24.4 sandwich-RAM: An energy-efficient in-memory BWN architecture with pulse-width modulation[C]. 2019 IEEE International Solid-State Circuits Conference-(ISSCC), San Francisco, USA, 2019: 394–396.
[30]	CHIU Y C, ZHANG Zhixiao, CHEN Jiajing, et al. A 4-Kb 1-to-8-bit configurable 6T SRAM-based computation-in-memory unit-macro for CNN-based AI edge processors[J]. IEEE Journal of Solid-State Circuits, 2020, 55(10): 2790–2801. doi: 10.1109/JSSC.2020.3005754
[31]	JIA Hongyang, VALAVI H, TANG Yinqi, et al. A programmable heterogeneous microprocessor based on bit-scalable in-memory computing[J]. IEEE Journal of Solid-State Circuits, 2020, 55(9): 2609–2621. doi: 10.1109/JSSC.2020.2987714
[32]	JIANG Zhewei, YIN Shihui, SEO J S, et al. C3SRAM: An in-memory-computing SRAM macro based on robust capacitive coupling computing mechanism[J]. IEEE Journal of Solid-State Circuits, 2020, 55(7): 1888–1897. doi: 10.1109/JSSC.2020.2992886
[33]	YIN Shihui, JIANG Zhewei, SEO J S, et al. XNOR-SRAM: In-memory computing SRAM macro for binary/ternary deep neural networks[J]. IEEE Journal of Solid-State Circuits, 2020, 55(6): 1733–1743. doi: 10.1109/JSSC.2019.2963616
[34]	YAN Bonan, YANG Qing, CHEN Weihao, et al. RRAM-based spiking nonvolatile computing-in-memory processing engine with precision-configurable in situ nonlinear activation[C]. 2019 Symposium on VLSI Technology, Kyoto, Japan, 2019: T86-T87.
[35]	XUE Chengxin, CHEN Weihao, LIU J S, et al. 24.1 A 1Mb multibit RRAM computing-in-memory macro with 14.6ns parallel MAC computing time for CNN based AI edge processors[C]. 2019 IEEE International Solid-State Circuits Conference-(ISSCC), San Francisco, USA, 2019: 388–390.
[36]	XUE Chengxin, CHEN Weihao, LIU J S, et al. Embedded 1-Mb ReRAM-based computing-in-memory macro with multibit input and weight for CNN-based AI edge processors[J]. IEEE Journal of Solid-State Circuits, 2020, 55(1): 203–215. doi: 10.1109/JSSC.2019.2951363
[37]	ZHA Yue, NOWAK E, and LI Jing. Liquid silicon: A nonvolatile fully programmable processing-in-memory processor with monolithically integrated ReRAM[J]. IEEE Journal of Solid-State Circuits, 2020, 55(4): 908–919. doi: 10.1109/JSSC.2019.2963005
[38]	ZHANG H, LIU J, BAI J, et al. HD-CIM: Hybrid-Device Computing-In-Memory Structure Based on MRAM and SRAM to Reduce Weight Loading Energy of Neural Networks[J]. IEEE Transactions on Circuits and Systems I: Regular Papers, 2022, 69(11): 4465–4474.
[39]	SEBASTIAN A, LE GALLO M, KHADDAM-ALJAMEH R, et al. Memory devices and applications for in-memory computing[J]. Nature Nanotechnology, 2020, 15(7): 529–544. doi: 10.1038/s41565-020-0655-z
[40]	DONG Qing, SINANGIL M E, ERBAGCI B, et al. 15.3 A 351TOPS/W and 372.4GOPS compute-in-memory SRAM macro in 7nm FinFET CMOS for machine-learning applications[C]. 2020 IEEE International Solid-State Circuits Conference-(ISSCC), San Francisco, USA, 2020: 242–244.
[41]	CHIH Y D, LEE P H, FUJIWARA H, et al. 16.4 an 89TOPS/W and 16.3TOPS/mm² all-digital SRAM-based full-precision compute-in memory macro in 22nm for machine-learning edge applications[C]. 2021 IEEE International Solid-State Circuits Conference (ISSCC), San Francisco, USA, 2021: 252–254.
[42]	YAN Bonan, HSU J L, YU Pangcheng, et al. A 1.041-Mb/mm² 27.38-TOPS/W Signed-INT8 dynamic-logic-based ADC-less SRAM compute-in-Memory Macro in 28nm with reconfigurable bitwise operation for AI and embedded applications[C]. 2022 IEEE International Solid-State Circuits Conference (ISSCC), San Francisco, USA, 2022: 188–190.
[43]	SESHADRI V, LEE D, MULLINS T, et al. Ambit: In-memory accelerator for bulk bitwise operations using commodity DRAM technology[C]. The 50th Annual IEEE/ACM International Symposium on Microarchitecture, Cambridge, UK, 2017: 273–287.
[44]	NATSUI M, SUZUKI D, SAKIMURA N, et al. Nonvolatile logic-in-memory LSI using cycle-based power gating and its application to motion-vector prediction[J]. IEEE Journal of Solid-State Circuits, 2015, 50(2): 476–489. doi: 10.1109/JSSC.2014.2362853
[45]	ZHANG He, KANG Wang, CAO Kaihua, et al. Spintronic processing unit in spin transfer torque magnetic random access memory[J]. IEEE Transactions on Electron Devices, 2019, 66(4): 2017–2022. doi: 10.1109/TED.2019.2898391
[46]	ZHANG He, KANG Wang, WANG Lezhi, et al. Stateful reconfigurable logic via a single-voltage-gated spin hall-effect driven magnetic tunnel junction in a spintronic memory[J]. IEEE Transactions on Electron Devices, 2017, 64(10): 4295–4301. doi: 10.1109/TED.2017.2726544
[47]	WANG Haotian, KANG Wang, PAN Biao, et al. Spintronic computing-in-memory architecture based on voltage-controlled spin–orbit torque devices for binary neural networks[J]. IEEE Transactions on Electron Devices, 2021, 68(10): 4944–4950. doi: 10.1109/TED.2021.3102896
[48]	DEAVILLE P, ZHANG Bonan, CHEN L Y, et al. A maximally row-parallel MRAM in-memory-computing macro addressing readout circuitsensitivity and area[C]. ESSCIRC 2021-IEEE 47th European Solid State Circuits Conference (ESSCIRC), Grenoble, France, 2021: 75–78.
[49]	JUNG S, LEE H, MYUNG S, et al. A crossbar array of magnetoresistive memory devices for in-memory computing[J]. Nature, 2022, 601(7892): 211–216. doi: 10.1038/s41586-021-04196-6
[50]	SONG K M, JEONG J S, PAN Biao, et al. Skyrmion-based artificial synapses for neuromorphic computing[J]. Nature Electronics, 2020, 3(3): 148–155. doi: 10.1038/s41928-020-0385-0
[51]	CHEN Chao, LIN Tao, NIU Jianteng, et al. Surface acoustic wave controlled skyrmion-based synapse devices[J]. Nanotechnology, 2022, 33(11): 115205. doi: 10.1088/1361-6528/ac3f14
[52]	HUANG Yangqi, KANG Wang, ZHANG Xichao, et al. Magnetic skyrmion-based synaptic devices[J]. Nanotechnology, 2017, 28(8): 08LT02. doi: 10.1088/1361-6528/aa5838
[53]	HU Hanwen, WANG Weichen, CHEN C K, et al. A 512 Gb in-memory-computing 3D-NAND flash supporting similar-vector-matching operations on edge-AI devices[C]. 2022 IEEE International Solid-State Circuits Conference (ISSCC), San Francisco, USA, 2022: 138–140.
[54]	KIM M, LIU Muqing, EVERSON L R, et al. An embedded NAND flash-based compute-in-memory array demonstrated in a standard logic process[J]. IEEE Journal of Solid-State Circuits, 2022, 57(2): 625–638. doi: 10.1109/JSSC.2021.3098671