高级搜索

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

NAS4CIM:面向忆阻器存算一体芯片的神经网络结构搜索框架

李源堃 王泽 张清天 高滨 吴华强

李源堃, 王泽, 张清天, 高滨, 吴华强. NAS4CIM:面向忆阻器存算一体芯片的神经网络结构搜索框架[J]. 电子与信息学报. doi: 10.11999/JEIT250978
引用本文: 李源堃, 王泽, 张清天, 高滨, 吴华强. NAS4CIM:面向忆阻器存算一体芯片的神经网络结构搜索框架[J]. 电子与信息学报. doi: 10.11999/JEIT250978
LI Yuankun, WANG Ze, ZHANG Qingtian, GAO Bin, WU Huaqiang. NAS4CIM: Tailored Neural Network Architecture Search for RRAM-Based Compute-in-Memory Chips[J]. Journal of Electronics & Information Technology. doi: 10.11999/JEIT250978
Citation: LI Yuankun, WANG Ze, ZHANG Qingtian, GAO Bin, WU Huaqiang. NAS4CIM: Tailored Neural Network Architecture Search for RRAM-Based Compute-in-Memory Chips[J]. Journal of Electronics & Information Technology. doi: 10.11999/JEIT250978

NAS4CIM:面向忆阻器存算一体芯片的神经网络结构搜索框架

doi: 10.11999/JEIT250978 cstr: 32379.14.JEIT250978
基金项目: 国家自然科学基金联合资助基金 (U2341221)
详细信息
    作者简介:

    李源堃:男,博士生,研究方向为存算一体架构与算法协同优化等

    王泽:男,博士后,研究方向为存算一体数模混合架构、混合计算等

    张清天:男,副研究员,研究方向为存算一体架构与应用算法等

    高滨:男,教授,研究方向为新型存储器、器件模型与模拟、设计-工艺协同优化等

    吴华强:男,教授,研究方向为新型存储器、器件、工艺集成、架构、算法等

    通讯作者:

    张清天 zhangqt0103@mail.tsinghua.edu.cn

NAS4CIM: Tailored Neural Network Architecture Search for RRAM-Based Compute-in-Memory Chips

Funds: The National Natural Science Foundation of China (U2341221)
  • 摘要: 基于忆阻器(RRAM)的存算一体(CIM)芯片被认为是解决卫星任务在轨智能处理过程中功耗受限与效率瓶颈的一条重要出路,在提升深度神经网络推理效率方面展现出巨大潜力。然而面向星上处理任务的复杂性,RRAM-CIM架构需要探索匹配的神经网络结构来发挥其能效优势。在 RRAM-CIM 场景下,现有方法在任务性能搜索中多采用基于随机采样的一次性训练,容易导致训练过程不稳定;在硬件性能建模上则常依赖预测器或算子级建模,前者需要高昂的初始成本,后者则忽视了网络整体的硬件表现。为此,该文提出NAS4CIM搜索框架,并引入半解耦式蒸馏增强的梯度系数超网训练方法(DDE-GSCST),有效缓解候选算子间的干扰问题,提高搜索过程的稳定性和鲁棒性。同时,该文采用基于Top-K统计的算子选择策略,在保证任务精度的同时显著优化了硬件性能。在CIFAR-10与ImageNet数据集上的实验表明,该方法在相同硬件架构下较现有方法提升2.2%的最优精度,并降低33.3%的能耗-延迟积(EDP)。此外,基于真实RRAM宏单元的实片测试结果与仿真结果一致,进一步验证了方法的有效性。
  • 图  1  不同网络结构在同一CIM架构上的性能对比

    图  2  NAS4CIM流程图

    图  3  半解耦式蒸馏增强的梯度系数超网训练示意图

    图  4  基于Top-K的算子硬件系数筛选示意图

    图  5  DDE-GSCST预热与传统方法在CIFAR-10上的准确率优先搜索对比

    图  6  CIFAR-10上仅系数训练与交替训练的对比

    1  DDE-GSCST 预热训练阶段伪代码

     输入
      训练数据集D
      含有L个stage的教师网络T={T1, T2, ···, T3}
      超网络模块集合S={S1, S2, ···, S3},每个Si含候选权重参数
      θi和算子系数αi
     1. for i = 1, 2, ···, L do
     2.  将教师网络第i个stage替换为超网络模Si
     3.  冻结教师网络其余stage的参数
     4.  repeat直到收敛do
     5.   从D取一个batch,得到教师中间特征图fT = T{1, 2, ···, i}(x)
     6.   超网络模块输出特征图fs = (T{1, 2, ···, L}$\circ $Si)(x)
     7.   学生网络输出$ \mathit{ }y\mathrm{_s}=\left(T\mathit{_{\mathrm{\left\{1,2,\cdots,\mathit{i}-1\right\}}}°}S_i°T_{\left\{i+1,\cdots,L\right\}}\right)\left(x\right) $
     8.   计算任务损失$ \mathrm{\mathit{L}_{task}}=\mathrm{CE}(y_{\mathrm{s}},y\mathrm{_{label}}) $
     9.   计算蒸馏损失$ \mathrm{\mathit{L}_{distll}}=\mathrm{MSE}(f_{\mathrm{s}},f_{\mathrm{T}}) $
     10.   总损失$ L=L_{\mathrm{task}}+\lambda\times L\mathrm{_{distll}} $
     11.   根据L计算梯度,固定算子系数αi,更新权重参数θi
     12.   重新前向并计算L
     13.   根据Ltask计算梯度,固定权重参数θi,更新算子系数αi
     14. end repeat
     15. 保存该stage的训练参数与算子系数
     16. end for
     输出:
      预热后的超网络权重{θi}与算子系数{αi}
    下载: 导出CSV

    表  1  各任务网络搜索设置

    指标 FashionMNIST搜索空间 Cifar-10搜索空间 ImageNet搜索空间
    卷积核大小 3, 5 1, 3 1, 3, 5
    算子种类 空算子,标准卷积、残差卷积 空算子,标准卷积、标准残差块 空算子,标准残差块、瓶颈残差块
    通道数 16, 32, 64 16, 32, 48, 64 16, 32, 48, 64, 80, 96
    最大层数 3 9 8
    下载: 导出CSV

    表  2  硬件参数设置

    硬件指标 数值
    阵列规模 128×128
    ADC分辨率 8 bit
    ADC数量 8
    DAC分辨率 1 bit
    DAC数量 8×128
    下载: 导出CSV

    表  3  Cifar-10 数据集上的搜索结果对比

    方法 准确率(%) EDP
    (ms·mJ)
    参数量
    (K)
    计算量
    (MOPs)
    搜索
    时长(h)
    NACIM 73.9 1.55 - - 59
    UAE 83.0 - - - 154
    NAS4RRAM 84.4 - - - 289
    CARS(准确度优先) 88.0 11.03 - - 72
    Gibbon(准确度优先) 88.3 14.33 - - 6
    Gibbon(EDP优先) 84.6 0.24 - - 6
    NAS4CIM(准确度优先) 90.5 4.23 268.35 81.10 6
    NAS4CIM(EDP优先) 85.9 0.16 65.59 19.76 6
    NAS4CIM(平衡) 89.3 0.97 135.23 49.12 6
    下载: 导出CSV

    表  4  FashionMNIST 数据集上的搜索结果与实片测试结果

    方法 仿真准确率(%) 实片准确率(%) EDP(ms·mJ) 参数量(K) 计算量(MOPs)
    教师网络 92.8 - - 270.90 81.63
    NAS4CIM(准确度优先) 90.1 89.8 0.21 65.85 15.57
    NAS4CIM(EDP优先) 88.7 88.3 0.16 65.08 13.99
    下载: 导出CSV

    表  5  ImageNet 数据集上的搜索结果

    方法 准确率(%) EDP
    (ms·mJ)
    参数量(M) 计算量
    (GOPs)
    教师网络 71.6 - 11.50 3.59
    NAS4CIM(准确度优先) 70.0 629.58 3.84 2.78
    NAS4CIM(EDP优先) 66.6 504.74 3.64 1.94
    下载: 导出CSV
  • [1] SANTORO G, TURVANI G, and GRAZIANO M. New logic-in-memory paradigms: An architectural and technological perspective[J]. Micromachines, 2019, 10(6): 368. doi: 10.3390/mi10060368.
    [2] 蔺海荣, 段晨星, 邓晓衡, 等. 双忆阻类脑混沌神经网络及其在IoMT数据隐私保护中应用[J]. 电子与信息学报, 2025, 47(7): 2194–2210. doi: 10.11999/JEIT241133.

    LIN Hairong, DUAN Chenxing, DENG Xiaoheng, et al. Dual-memristor brain-like chaotic neural network and its application in IoMT data privacy protection[J]. Journal of Electronics & Information Technology, 2025, 47(7): 2194–2210. doi: 10.11999/JEIT241133.
    [3] 江之行, 席悦, 唐建石, 等. 忆阻器及其存算一体应用研究进展[J]. 科技导报, 2024, 42(2): 31–49. doi: 10.3981/j.issn.1000-7857.2024.02.004.

    JIANG Zhixing, XI Yue, TANG Jianshi, et al. Review of recent research on memristors and computing-in-memory applications[J]. Science & Technology Review, 2024, 42(2): 31–49. doi: 10.3981/j.issn.1000-7857.2024.02.004.
    [4] 李冰, 午康俊, 王晶, 等. 基于忆阻器的图卷积神经网络加速器设计[J]. 电子与信息学报, 2023, 45(1): 106–115. doi: 10.11999/JEIT211435.

    LI Bing, WU Kangjun, WANG Jing, et al. Design of graph convolutional network accelerator based on resistive random access memory[J]. Journal of Electronics & Information Technology, 2023, 45(1): 106–115. doi: 10.11999/JEIT211435.
    [5] WAN W, KUBENDRAN R, SCHAEFER C, et al. A compute-in-memory chip based on resistive random-access memory[J]. Nature, 2022, 608(7923): 504–512. doi: 10.1038/s41586-022-04992-8.
    [6] 邝先验, 桓湘澜, 肖鸿彪, 等. 基于多端忆阻器的组合逻辑电路设计[J]. 电子元件与材料, 2024, 43(8): 1024–1030. doi: 10.14106/j.cnki.1001-2028.2024.1567.

    KUANG Xianyan, HUAN Xianglan, XIAO Hongbiao, et al. Design of combinational logic circuit using multi-terminal memristor[J]. Electronic Components and Materials, 2024, 43(8): 1024–1030. doi: 10.14106/j.cnki.1001-2028.2024.1567.
    [7] 陈长林, 骆畅航, 刘森, 等. 忆阻器类脑计算芯片研究现状综述[J]. 国防科技大学学报, 2023, 45(1): 1–14. doi: 10.11887/j.cn.202301001.

    CHEN Changlin, LUO Changhang, LIU Sen, et al. Review on the memristor based neuromorphic chips[J]. Journal of National University of Defense Technology, 2023, 45(1): 1–14. doi: 10.11887/j.cn.202301001.
    [8] PANDAY D K, SUMAN S, KHAN S, et al. Fabrication and characterization of alumina based resistive RAM for space applications[C]. The IEEE 24th International Conference on Nanotechnology (NANO), Gijon, Spain, 2024: 253–257. doi: 10.1109/NANO61778.2024.10628609.
    [9] YAO Peng, WU Huaqiang, GAO Bin, et al. Fully hardware-implemented memristor convolutional neural network[J]. Nature, 2020, 577(7792): 641–646. doi: 10.1038/s41586-020-1942-4.
    [10] CAI Han, ZHU Ligeng, and HAN Song. ProxylessNAS: Direct neural architecture search on target task and hardware[C]. The 7th International Conference on Learning Representations (ICLR), New Orleans, USA, 2019.
    [11] NEGI S, CHAKRABORTY I, ANKIT A, et al. NAX: Neural architecture and memristive xbar based accelerator co-design[C]. The 59th ACM/IEEE Design Automation Conference (DAC), San Francisco, USA, 2022: 451–456. doi: 10.1145/3489517.3530476.
    [12] JIANG Weiwen, LOU Qiuwen, YAN Zheyu, et al. Device-circuit-architecture co-exploration for computing-in-memory neural accelerators[J]. IEEE Transactions on Computers, 2021, 70(4): 595–605. doi: 10.1109/TC.2020.2991575.
    [13] YUAN Zhihang, LIU Jingze, LI Xingchen, et al. NAS4RRAM: Neural network architecture search for inference on RRAM-based accelerators[J]. Science China Information Sciences, 2021, 64(6): 160407. doi: 10.1007/s11432-020-3245-7.
    [14] SUN Hanbo, ZHU Zhenhua, WANG Chenyu, et al. Gibbon: An efficient co-exploration framework of NN model and processing-in-memory architecture[J]. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2023, 42(11): 4075–4089. doi: 10.1109/TCAD.2023.3262201.
    [15] KRIZHEVSKY A. Learning multiple layers of features from tiny images[EB/OL]. https://www.cs.toronto.edu/~kriz/learning-features-2009-TR.pdf, 2009.
    [16] DENG Jia, DONG Wei, SOCHER R, et al. ImageNet: A large-scale hierarchical image database[C]. The 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, USA, 2009: 248–255. doi: 10.1109/CVPR.2009.5206848.
    [17] XIAO Han, RASUL K, and VOLLGRAF R. Fashion-MNIST: A novel image dataset for benchmarking machine learning algorithms[EB/OL]. arXiv preprint arXiv: 1708.07747, 2017. doi: 10.48550/arXiv.1708.07747.
    [18] HE Kaiming, ZHANG Xiangyu, REN Shaoqing, et al. Deep residual learning for image recognition[C]. The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, USA, 2016: 770–778. doi: 10.1109/CVPR.2016.90.
    [19] LIU Hanxiao, SIMONYAN K, and YANG Yiming. DARTS: Differentiable architecture search[C]. The 7th International Conference on Learning Representations, New Orleans, USA, 2019.
    [20] ESSER S K, MCKINSTRY J L, BABLANI D, et al. Learned step size quantization[C]. The 8th International Conference on Learning Representations, Addis Ababa, Ethiopia, 2020.
    [21] ZHANG Yibei, ZHANG Qingtian, QIN Qi, et al. An RRAM retention prediction framework using a convolutional neural network based on relaxation behavior[J]. Neuromorphic Computing and Engineering, 2023, 3(1): 014011. doi: 10.1088/2634-4386/acb965.
    [22] ZHU Zhenhua, SUN Hanbo, XIE Tongxin, et al. MNSIM 2.0: A behavior-level modeling tool for processing-in-memory architectures[J]. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2023, 42(11): 4112–4125. doi: 10.1109/TCAD.2023.3251696.
    [23] SHI Yuanming, ZHU Jingyang, JIANG Chunxiao, et al. Satellite edge artificial intelligence with large models: Architectures and technologies[J]. Science China Information Sciences, 2025, 68(7): 170302. doi: 10.1007/s11432-024-4425-y.
  • 加载中
图(6) / 表(6)
计量
  • 文章访问数:  63
  • HTML全文浏览量:  27
  • PDF下载量:  9
  • 被引次数: 0
出版历程
  • 收稿日期:  2025-09-24
  • 修回日期:  2025-12-31
  • 网络出版日期:  2026-01-09

目录

    /

    返回文章
    返回