高级搜索

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

近似计算新范式在深度学习加速系统中的应用及研究进展

龚宇 王丽萍 王佑 刘伟强

龚宇, 王丽萍, 王佑, 刘伟强. 近似计算新范式在深度学习加速系统中的应用及研究进展[J]. 电子与信息学报, 2023, 45(9): 3098-3108. doi: 10.11999/JEIT230352
引用本文: 龚宇, 王丽萍, 王佑, 刘伟强. 近似计算新范式在深度学习加速系统中的应用及研究进展[J]. 电子与信息学报, 2023, 45(9): 3098-3108. doi: 10.11999/JEIT230352
GONG Yu, WANG Liping, WANG You, LIU Weiqiang. Application and Research Progress of Approximate Computing as a New Computing Paradigm in AI Acceleration Systems[J]. Journal of Electronics & Information Technology, 2023, 45(9): 3098-3108. doi: 10.11999/JEIT230352
Citation: GONG Yu, WANG Liping, WANG You, LIU Weiqiang. Application and Research Progress of Approximate Computing as a New Computing Paradigm in AI Acceleration Systems[J]. Journal of Electronics & Information Technology, 2023, 45(9): 3098-3108. doi: 10.11999/JEIT230352

近似计算新范式在深度学习加速系统中的应用及研究进展

doi: 10.11999/JEIT230352
基金项目: 国家自然科学基金(62022041),中央高校基本科研业务费(NJ2023020)
详细信息
    作者简介:

    龚宇:男,副研究员,研究方向为近似计算、人工智能系统芯片设计

    王丽萍:女,博士生,研究方向为近似计算、人工智能系统芯片设计

    王佑:男,副研究员,研究方向为近似计算、数字芯片设计

    刘伟强:男,教授,研究方向为近似计算、集成电路设计等

    通讯作者:

    刘伟强 liuweiqiang@nuaa.edu.cn

  • 中图分类号: TN402; TP183

Application and Research Progress of Approximate Computing as a New Computing Paradigm in AI Acceleration Systems

Funds: The National Natural Science Foundation of China (62022041), The Fundamental Research Funds for the Central Universities (NJ2023020)
  • 摘要: 深度学习已经成为当前人工智能技术中最为重要的算法之一。随着应用场景不断扩展,深度学习硬件规模越来越大,计算复杂度呈现数量级提升趋势,对加速系统提出了极高能效需求。后摩尔时代,新型计算范式逐渐取代工艺微缩成为提升能效的有效方案,近似计算以牺牲部分精度的代价换取大幅能效提升,成为最具前景的设计方法之一。该文以深度学习加速系统的不同设计层次为切入,首先介绍了深度学习网络模型的算法特征,围绕算法层的近似计算方案介绍了量化方法的研究进展;其次,围绕硬件架构和电路层调研了当前深度学习加速在图像、语音等多个方向采用的近似电路和架构方案,围绕层次化的设计方法调研了当前近似计算的系统设计方法学及EDA领域的关键问题和研究进展;最后,对该领域方向进行展望,旨在推动近似计算新范式在深度学习加速系统中的应用。
  • 图  1  近似计算设计流程与评估方法

  • [1] KRIZHEVSKY A, SUTSKEVER I, and HINTON G E. ImageNet classification with deep convolutional neural networks[J]. Communications of the ACM, 2017, 60(6): 84–90. doi: 10.1145/3065386
    [2] SIMONYAN K and ZISSERMAN A. Very deep convolutional networks for large-scale image recognition[C]. Proceedings of the 3rd International Conference on Learning Representations, San Diego, USA, 2015.
    [3] LIN Zichuan, LI Junyou, SHI Jianing, et al. JueWu-MC: Playing minecraft with sample-efficient hierarchical reinforcement learning[C]. Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, Vienna, Austria, 2022.
    [4] LI Zewen, LIU Fan, YANG Wenjie, et al. A survey of convolutional neural networks: Analysis, applications, and prospects[J]. IEEE Transactions on Neural Networks and Learning Systems, 2022, 33(12): 6999–7019. doi: 10.1109/TNNLS.2021.3084827
    [5] REYNOLDS L and MCDONELL K. Prompt programming for large language models: Beyond the few-shot paradigm[C]. Proceedings of 2021 CHI Conference on Human Factors in Computing Systems, Yokohama, Japan, 2021,
    [6] GAWLIKOWSKI J, TASSI C R N, ALI M, et al. A survey of uncertainty in deep neural networks[J]. arXiv: 2107.03342, 2021.
    [7] LECUN Y. 1.1 Deep learning hardware: Past, present, and future[C]. Proceedings of 2019 IEEE International Solid-State Circuits Conference, San Francisco, USA, 2019: 536–543.
    [8] MENGHANI G. Efficient deep learning: A survey on making deep learning models smaller, faster, and better[J]. ACM Computing Surveys, 2023, 55(12): 259. doi: 10.1145/3578938
    [9] CAI Hao, BIAN Zhongjian, HOU Yaoru, et al. 33.4 A 28nm 2Mb STT-MRAM computing-in-memory macro with a refined bit-cell and 22.4–41.5 TOPS/W for AI inference[C]. Proceedings of 2023 IEEE International Solid-State Circuits Conference, San Francisco, USA, 2023.
    [10] GUO An, SI Xin, CHEN Xi, et al. A 28nm 64-kb 31.6-TFLOPS/W digital-domain floating-point-computing-unit and double-bit 6T-SRAM computing-in-memory macro for floating-point CNNs[C]. Proceedings of 2023 IEEE International Solid-State Circuits Conference, San Francisco, USA, 2023.
    [11] WANG Shaowei, XIE Guangjun, HAN Jie, et al. Highly accurate division and square root circuits by exploiting signal correlation in stochastic computing[J]. International Journal of Circuit Theory and Applications, 2022, 50(4): 1375–1385. doi: 10.1002/cta.3219
    [12] XU Wenbing, XIE Guangjun, WANG Shaowei, et al. A stochastic computing architecture for local contrast and mean image thresholding algorithm[J]. International Journal of Circuit Theory and Applications, 2022, 50(9): 3279–3291. doi: 10.1002/cta.3320
    [13] LIU Shanshan, TANG Xiaochen, NIKNIA F, et al. Stochastic dividers for low latency neural networks[J]. IEEE Transactions on Circuits and Systems I:Regular Papers, 2021, 68(10): 4102–4115. doi: 10.1109/TCSI.2021.3103926
    [14] LIU Weiqiang and LOMBARDI F. Approximate Computing[M]. Cham: Springer, 2022.
    [15] LIU Weiqiang, LOMBARDI F, and SCHULTE M. Approximate computing: From circuits to applications[J]. Proceedings of the IEEE, 2020, 108(12): 2103–2107. doi: 10.1109/JPROC.2020.3033361
    [16] REDA S and SHAFIQUE M. Approximate Circuits[M]. Cham: Springer, 2019.
    [17] 高越. 深度神经网络中Softmax函数的近似设计与硬件加速[D]. [硕士论文], 南京航空航天大学, 2021.

    GAO Yue. Approximate design and hardware acceleration of Softmax function for deep neural networks[D]. [Master dissertation], Nanjing University of Aeronautics and Astronautics, 2021.
    [18] 闫成刚, 赵轩, 徐宸宇, 等. 基于部分积概率分析的高精度低功耗近似浮点乘法器设计[J]. 电子与信息学报, 2023, 45(1): 87–95. doi: 10.11999/JEIT211485

    YAN Chenggang, ZHAO Xuan, XU Chenyu, et al. Design of high precision low power approximate floating-point multiplier based on partial product probability analysis[J]. Journal of Electronics &Information Technology, 2023, 45(1): 87–95. doi: 10.11999/JEIT211485
    [19] 黄乐朋. 面向低功耗关键词识别的近似计算模块设计[D]. [硕士论文], 东南大学, 2021.

    HUANG Lepeng. Design of approximate computing module for low power keyword recognition[D]. [Master dissertation], Southeast University, 2021.
    [20] 朱文涛. 面向多噪声场景低功耗关键词识别的可重构架构及电路实现[D]. [硕士论文], 东南大学, 2021.

    ZHU Wentao. Design and implementation of a reconfigurable architecture for low-power keyword spotting in multiple noise scenarios[D]. [Master dissertation], Southeast University, 2021.
    [21] 李焱. 面向二值化权重网络的近似加法单元设计[D]. [硕士论文], 东南大学, 2021.

    LI Yan. Design of approximate addition unit for binarized weight network[D]. [Master dissertation], Southeast University, 2021.
    [22] 张子骥. 基于近似计算的高能效电路设计技术研究[D]. [博士论文], 电子科技大学, 2021.

    ZHANG Ziji. Research on energy-efficient circuit design technology based on approximate calculation[D]. [Ph. D. dissertation], University of Electronic Science and Technology, 2021.
    [23] 裴浩然. 基于近似计算的自适应滤波器设计[D]. [硕士论文], 电子科技大学, 2021.

    PEI Haoran. Design of adaptive filter based on approximate computing[D]. [Master dissertation], University of Electronic Science and Technology, 2021.
    [24] 徐成文. 基于神经网络的近似计算训练算法[D]. [硕士论文], 上海交通大学, 2019.

    XU Chengwen. Cost-efficient and quality assured approximate computing framework using nerual network[D]. [Master dissertation], Shanghai Jiao Tong University, 2019.
    [25] 武翔宇. 一种新型的近似计算训练框架[D]. [硕士论文], 上海交通大学, 2017.

    WU Xiangyu. A novel quality trade-offs method for approximate acceleration by iterative training[D]. [Master dissertation], Shanghai Jiao Tong University, 2017.
    [26] 赵越. 基于近似计算的低功耗乘法器设计与实现[D]. [硕士论文], 上海交通大学, 2020.

    ZHAO Yue. Design and implementation of low power multiplier based on approximate computing[D]. [Master dissertation], Shanghai Jiao Tong University, 2020.
    [27] 季宇, 张悠慧, 郑纬民. 基于忆阻器的近似计算方法[J]. 清华大学学报:自然科学版, 2021, 61(6): 610–617. doi: 10.16511/j.cnki.qhdxxb.2020.22.027

    JI Yu, ZHANG Youhui, and ZHENG Weimin. Approximate computing method based on memristors[J]. Journal of Tsinghua University:Science and Technology, 2021, 61(6): 610–617. doi: 10.16511/j.cnki.qhdxxb.2020.22.027
    [28] 王智慧. 基于近似计算的高能效JPEG加速器设计[D]. [硕士论文], 清华大学, 2018.

    WANG Zhihui. An energy efficient JPEG encoder with approximate computing paradigm[D]. [Master dissertation], Tsinghua University, 2018.
    [29] 张士长, 王郁杰, 肖航, 等. 支持CNN与LSTM的二值权重神经网络芯片[J]. 高技术通讯, 2021, 31(2): 122–128. doi: 10.3772/j.issn.1002-0470.2021.02.002

    ZHANG Shichang, WANG Yujie, XIAO Hang, et al. Binary-weight neural network chip supporting CNN and LSTM[J]. Chinese High Technology Letters, 2021, 31(2): 122–128. doi: 10.3772/j.issn.1002-0470.2021.02.002
    [30] 陆维娜, 胡瑜, 叶靖, 等. 面向卷积神经网络加速器吞吐量优化的FPGA自动化设计方法[J]. 计算机辅助设计与图形学学报, 2018, 30(11): 2164–2173. doi: 10.3724/SP.J.1089.2018.17039

    LU Weina, HU Yu, YE Jing, et al. Throughput-oriented automatic design of FPGA accelerator for convolutional neural networks[J]. Journal of Computer-Aided Design &Computer Graphics, 2018, 30(11): 2164–2173. doi: 10.3724/SP.J.1089.2018.17039
    [31] 朱新忠, 程利甫, 吴有余, 等. 基于误差模型的权重二值神经网络近似加速[J]. 上海航天(中英文), 2021, 38(4): 25–30. doi: 10.19328/j.cnki.2096-8655.2021.04.004

    ZHU Xinzhong, CHENG Lifu, WU Youyu, et al. Error model based approximate computing design for binarized weight neural network system[J]. Aerospace Shanghai (Chinese &English), 2021, 38(4): 25–30. doi: 10.19328/j.cnki.2096-8655.2021.04.004
    [32] LIANG Tailin, GLOSSNER J, WANG Lei, et al. Pruning and quantization for deep neural network acceleration: A survey[J]. Neurocomputing, 2021, 461: 370–403. doi: 10.1016/j.neucom.2021.07.045
    [33] BASKIN C, LISS N, SCHWARTZ E, et al. UNIQ: Uniform noise injection for non-uniform quantization of neural networks[J]. ACM Transactions on Computer Systems, 2019, 37(1/4): 4. doi: 10.1145/3444943
    [34] BANNER R, NAHSHAN Y, and SOUDRY D. Post training 4-bit quantization of convolutional networks for rapid-deployment[C]. Proceedings of the 33rd International Conference on Neural Information Processing Systems, Vancouver, Canada, 2019.
    [35] VOGEL S, SPRINGER J, GUNTORO A, et al. Self-supervised quantization of pre-trained neural networks for multiplierless acceleration[C]. Proceedings of 2019 Design, Automation & Test in Europe Conference & Exhibition, Florence, Italy, 2019: 1094–1099.
    [36] GYSEL P, PIMENTEL J, MOTAMEDI M, et al. Ristretto: A framework for empirical study of resource-efficient inference in convolutional neural networks[J]. IEEE Transactions on Neural Networks and Learning Systems, 2018, 29(11): 5784–5789. doi: 10.1109/TNNLS.2018.2808319
    [37] NAGEL M, FOURNARAKIS M, AMJAD R A, et al. A white paper on neural network quantization[J]. arXiv: 2106.08295, 2021.
    [38] JUNG S, SON C, LEE S, et al. Learning to quantize deep networks by optimizing quantization intervals with task loss[C]. Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, USA, 2019: 4345–4354.
    [39] CHOI J, VENKATARAMANI S, SRINIVASAN V, et al. Accurate and efficient 2-bit quantized neural networks[C]. Proceedings of Machine Learning and Systems 2019, Stanford, USA, 2019.
    [40] ZHOU Shuchang, WU Yuxin, NI Zekun, et al. DoReFa-Net: Training low bitwidth convolutional neural networks with low bitwidth gradients[J]. arXiv: 1606.06160, 2018.
    [41] RASTEGARI M, ORDONEZ V, REDMON J, et al. XNOR-Net: ImageNet classification using binary convolutional neural networks[C]. Proceedings of the 14th European Conference on Computer Vision, Amsterdam, The Netherlands, 2016: 525–542.
    [42] ANDRI R, CAVIGELLI L, ROSSI D, et al. YodaNN: An architecture for ultralow power binary-weight CNN acceleration[J]. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2018, 37(1): 48–60. doi: 10.1109/TCAD.2017.2682138
    [43] GONG Yu, CAI Hao, WU Haige, et al. Quality driven systematic approximation for binary-weight neural network deployment[J]. IEEE Transactions on Circuits and Systems I:Regular Papers, 2022, 69(7): 2928–2940. doi: 10.1109/TCSI.2022.3164170
    [44] SHAFIQUE M, AHMAD W, HAFIZ R, et al. A low latency generic accuracy configurable adder[C]. Proceedings of the 52nd Annual Design Automation Conference, San Francisco, USA, 2015: 86.
    [45] HANIF M A, HAFIZ R, HASAN O, et al. QuAd: Design and analysis of quality-area optimal low-latency approximate adders[C]. Proceedings of the 54th Annual Design Automation Conference, Austin, USA, 2017: 42.
    [46] ZHU Ning, GOH W L, ZHANG W, et al. Design of low-power high-speed truncation-error-tolerant adder and its application in digital signal processing[J]. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 2010, 18(8): 1225–1229. doi: 10.1109/TVLSI.2009.2020591
    [47] ZHU Ning, GOH W L, and YEO K S. An enhanced low-power high-speed adder for error-tolerant application[C]. Proceedings of the 2009 12th International Symposium on Integrated Circuits, Singapore, 2009: 69–72.
    [48] GUPTA V, MOHAPATRA D, RAGHUNATHAN A, et al. Low-power digital signal processing using approximate adders[J]. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2013, 32(1): 124–137. doi: 10.1109/TCAD.2012.2217962
    [49] CAMUS V, SCHLACHTER J, and ENZ C. Energy-efficient inexact speculative adder with high performance and accuracy control[C]. Proceedings of 2015 IEEE International Symposium on Circuits and Systems, Lisbon, Portugal, 2015: 45–48.
    [50] CAMUS V, SCHLACHTER J, and ENZ C. A low-power carry cut-back approximate adder with fixed-point implementation and floating-point precision[C]. Proceedings of the 53rd Annual Design Automation Conference, Austin, USA, 2016: 127.
    [51] VERMA A K, BRISK P, and IENNE P. Variable latency speculative addition: A new paradigm for arithmetic circuit design[C]. Proceedings of 2008 Design, Automation and Test in Europe, Munich, Germany, 2008: 1250–1255.
    [52] MORGENSHTEIN A, YUZHANINOV V, KOVSHILOVSKY A, et al. Full-swing gate diffusion input logic-case-study of low-power CLA adder design[J]. Integration, 2014, 47(1): 62–70. doi: 10.1016/j.vlsi.2013.04.002
    [53] HAN Jie and ORSHANSKY M. Approximate computing: An emerging paradigm for energy-efficient design[C]. Proceedings of 2013 18th IEEE European Test Symposium, Avignon, France, 2013: 1–6.
    [54] MAHDIANI H R, AHMADI A, FAKHRAIE S M et al. Bio-inspired imprecise computational blocks for efficient VLSI implementation of soft-computing applications[J]. IEEE Transactions on Circuits and Systems I:Regular Papers, 2010, 57(4): 850–862. doi: 10.1109/TCSI.2009.2027626
    [55] JOHN V, SAM S, RADHA S, et al. Design of a power-efficient Kogge–Stone adder by exploring new OR gate in 45nm CMOS process[J]. Circuit World, 2020, 46(4): 257–269. doi: 10.1108/CW-12-2018-0104
    [56] LIU Bo, WANG Ziyu, WANG Xuetao, et al. An efficient BCNN deployment method using quality-aware approximate computing[J]. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2022, 41(11): 4217–4228. doi: 10.1109/TCAD.2022.3197509
    [57] LIU Weiqiang, QIAN Liangyu, WANG Chenghua, et al. Design of approximate radix-4 booth multipliers for error-tolerant computing[J]. IEEE Transactions on Computers, 2017, 66(8): 1435–1441. doi: 10.1109/TC.2017.2672976
    [58] BORO B, REDDY K M, KUMAR Y B N, et al. Approximate radix-8 Booth multiplier for low power and high speed applications[J]. Microelectronics Journal, 2020, 101: 104816. doi: 10.1016/j.mejo.2020.104816
    [59] WARIS H, WANG Chenghua, and LIU Weiqiang. Hybrid low radix encoding-based approximate booth multipliers[J]. IEEE Transactions on Circuits and Systems II:Express Briefs, 2020, 67(12): 3367–3371. doi: 10.1109/TCSII.2020.2975094
    [60] YIN Shouyi, OUYANG Peng, ZHENG Shixuan, et al. A 141 UW, 2.46 PJ/neuron binarized convolutional neural network based self-learning speech recognition processor in 28NM CMOS[C]. Proceedings of 2018 IEEE Symposium on VLSI Circuits, Honolulu, USA, 2018: 139–140.
    [61] ZHAO Yue, LI Tong, DONG Feng, et al. A new approximate multiplier design for digital signal processing[C]. Proceedings of 2019 IEEE 13th international conference on ASIC, Chongqing China, 2019: 1–4.
    [62] MITCHELL J N. Computer multiplication and division using binary logarithms[J]. IRE Transactions on Electronic Computers, 1962, EC-11(4): 512–517. doi: 10.1109/TEC.1962.5219391
    [63] YIN Peipei, WANG Chenghua, WARIS H, et al. Design and analysis of energy-efficient dynamic range approximate logarithmic multipliers for machine learning[J]. IEEE Transactions on Sustainable Computing, 2021, 6(4): 612–625. doi: 10.1109/TSUSC.2020.3004980
    [64] LIU Weiqiang, XU Jiahua, WANG Danye, et al. Design and evaluation of approximate logarithmic multipliers for low power error-tolerant applications[J]. IEEE Transactions on Circuits and Systems I:Regular Papers, 2018, 65(9): 2856–2868. doi: 10.1109/TCSI.2018.2792902
    [65] SAADAT H, JAVAID H, IGNJATOVIC A, et al. REALM: Reduced-error approximate log-based integer multiplier[C]. Proceedings of 2020 Design, Automation & Test in Europe Conference & Exhibition, Grenoble, France, 2020: 1366–1371.
    [66] MRAZEK V, HRBACEK R, VASICEK Z, et al. EvoApprox8b: Library of approximate adders and multipliers for circuit design and benchmarking of approximation methods[C]. Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, Lausanne, Switzerland, 2017: 258–261.
    [67] VENKATARAMANI S, RANJAN A, ROY K, et al. AxNN: Energy-efficient neuromorphic systems using approximate computing[C]. Proceedings of 2014 IEEE/ACM International Symposium on Low Power Electronics and Design, La Jolla, USA, 2014: 27–32.
    [68] GIRALDO J S P and VERHELST M. Laika: A 5uW programmable LSTM accelerator for always-on keyword spotting in 65nm CMOS[C]. Proceedings of 2018 IEEE 44th European Solid State Circuits Conference, Dresden, Germany, 2018: 166–169.
    [69] PRICE M, GLASS J, and CHANDRAKASAN A P. 14.4 A scalable speech recognizer with deep-neural-network acoustic models and voice-activated power gating[C]. Proceedings of 2017 IEEE International Solid-State Circuits Conference, San Francisco, USA, 2017: 244–245.
    [70] YIN Shouyi, OUYANG Peng, TANG Shibin, et al. A high energy efficient reconfigurable hybrid neural network processor for deep learning applications[J]. IEEE Journal of Solid-State Circuits, 2018, 53(4): 968–982. doi: 10.1109/JSSC.2017.2778281
    [71] YIN Shouyi, OUYANG Peng, YANG Jianxun, et al. An energy-efficient reconfigurable processor for binary-and ternary-weight neural networks with flexible data bit width[J]. IEEE Journal of Solid-State Circuits, 2019, 54(4): 1120–1136. doi: 10.1109/JSSC.2018.2881913
    [72] LU Wenyan, YAN Guihai, and LI Xiaowei. AdaFlow: Aggressive convolutional neural networks approximation by leveraging the input variability[J]. Journal of Low Power Electronics, 2018, 14(4): 481–495. doi: 10.1166/jolpe.2018.1581
    [73] WANG Ying, HE Yintao, CHENG Long, et al. A fast precision tuning solution for always-on DNN accelerators[J]. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2022, 41(5): 1236–1248. doi: 10.1109/TCAD.2021.3089667
    [74] KIM Y D, JEONG W, JUNG L, et al. 2.4 A 7nm high-performance and energy-efficient mobile application processor with tri-cluster CPUs and a sparsity-aware NPU[C]. Proceedings of 2020 IEEE International Solid-State Circuits Conference, San Francisco, USA, 2020: 48–50.
    [75] MAZAHIR S, HASAN O, HAFIZ R, et al. Probabilistic error modeling for approximate adders[J]. IEEE Transactions on Computers, 2017, 66(3): 515–530. doi: 10.1109/TC.2016.2605382
    [76] AYUB M K, HASAN O, and SHAFIQUE M. Statistical error analysis for low power approximate adders[C]. Proceedings of the 54th ACM/EDAC/IEEE Design Automation Conference, Austin, USA, 2017: 1–6.
    [77] ZHU Yiying, LIU Weiqiang, HAN Jie, et al. A probabilistic error model and framework for approximate booth multipliers[C]. Proceedings of 2018 IEEE/ACM International Symposium on Nanoscale Architectures, Athens, Greece, 2018: 1–6.
    [78] LIU Chang, YANG Xinghua, QIAO Fei, et al. Design methodology for approximate accumulator based on statistical error model[C]. Proceedings of the 20th Asia and South Pacific Design Automation Conference, Chiba, Japan, 2015: 237–242.
    [79] VENKATESAN R, AGARWAL A, ROY K, et al. MACACO: Modeling and analysis of circuits for approximate computing[C]. Proceedings of 2011 IEEE/ACM International Conference on Computer-Aided Design, San Jose, USA, 2011: 667–673.
    [80] LIU Zheyu, LI Guihong, QIAO Fei, et al. Concrete: A per-layer configurable framework for evaluating DNN with approximate operators[C]. Proceedings of 2019 IEEE International Conference on Acoustics, Speech and Signal Processing, Brighton UK, 2019: 1552–1556.
    [81] FRENO B A and CARLBERG K T. Machine-learning error models for approximate solutions to parameterized systems of nonlinear equations[J]. Computer Methods in Applied Mechanics and Engineering, 2019, 348: 250–296. doi: 10.1016/j.cma.2019.01.024
    [82] JIANG Weiwen, ZHANG Xinyi, SHA E H M, et al. Accuracy vs. efficiency: Achieving both through FPGA-implementation aware neural architecture search[C]. Proceedings of 2019 56th ACM/IEEE Design Automation Conference, Las Vegas, USA, 2019: 1–6.
    [83] MRAZEK V, HANIF M A, VASICEK Z, et al. autoAx: An automatic design space exploration and circuit building methodology utilizing libraries of approximate components[C]. Proceedings of 2019 56th ACM/IEEE Design Automation Conference, Las Vegas, USA, 2019: 1–6.
    [84] ULLAH S, SAHOO S S, and KUMAR A. CLAppED: A design framework for implementing cross-layer approximation in FPGA-based embedded systems[C]. Proceedings of 2021 58th ACM/IEEE Design Automation Conference, San Francisco, USA, 2021: 475–480.
    [85] YU Ye, LI Yingmin, CHE Shuai, et al. Software-defined design space exploration for an efficient DNN accelerator architecture[J]. IEEE Transactions on Computers, 2021, 70(1): 45–56. doi: 10.1109/TC.2020.2983694
    [86] MEI Linyan, HOUSHMAND P, JAIN V, et al. ZigZag: Enlarging joint architecture-mapping design space exploration for DNN accelerators[J]. IEEE Transactions on Computers, 2021, 70(8): 1160–1174. doi: 10.1109/TC.2021.3059962
    [87] LI Haitong, BHARGAV M, WHATMOUGH P N, et al. On-chip memory technology design space explorations for mobile deep neural network accelerators[C]. Proceedings of 2019 56th ACM/IEEE Design Automation Conference, Las Vegas, USA, 2019: 1–6.
    [88] JIANG Weiwen, LOU Qiuwen, YAN Zheyu, et al. Device-circuit-architecture co-exploration for computing-in-memory neural accelerators[J]. IEEE Transactions on Computers, 2021, 70(4): 595–605. doi: 10.1109/TC.2020.2991575
    [89] VENKATESAN R, SHAO Y S, WANG Miaorong, et al. MAGNet: A modular accelerator generator for neural networks[C]. Proceedings of 2019 IEEE/ACM International Conference on Computer-Aided Design, Westminster, USA, 2019: 1–8.
    [90] LIN Yujun, YANG Mengtian, and HAN Song. NAAS: Neural accelerator architecture search[C]. Proceedings of 2021 58th ACM/IEEE Design Automation Conference, San Francisco, USA, 2021: 1051–1056.
    [91] VENIERIS S I and BOUGANIS C S. fpgaConvNet: Mapping regular and irregular convolutional neural networks on FPGAs[J]. IEEE Transactions on Neural Networks and Learning Systems, 2019, 30(2): 326–342. doi: 10.1109/TNNLS.2018.2844093
    [92] REAGEN B, HERNÁNDEZ-LOBATO J M, ADOLF R, et al. A case for efficient accelerator design space exploration via Bayesian optimization[C]. Proceedings of 2017 IEEE/ACM International Symposium on Low Power Electronics and Design, Taipei, China, 2017: 1–6.
  • 加载中
图(1)
计量
  • 文章访问数:  406
  • HTML全文浏览量:  233
  • PDF下载量:  135
  • 被引次数: 0
出版历程
  • 收稿日期:  2023-05-04
  • 修回日期:  2023-08-23
  • 网络出版日期:  2023-08-25
  • 刊出日期:  2023-09-27

目录

    /

    返回文章
    返回