近似计算新范式在深度学习加速系统中的应用及研究进展

龚宇; 王丽萍; 王佑; 刘伟强

doi:10.11999/JEIT230352

近似计算新范式在深度学习加速系统中的应用及研究进展

doi: 10.11999/JEIT230352

1.
南京航空航天大学电子信息工程学院/集成电路学院南京 211106
2.
空天集成电路与微系统工信部重点实验室南京 211106

基金项目: 国家自然科学基金(62022041)，中央高校基本科研业务费(NJ2023020)

详细信息

作者简介:
龚宇：男，副研究员，研究方向为近似计算、人工智能系统芯片设计

王丽萍：女，博士生，研究方向为近似计算、人工智能系统芯片设计

王佑：男，副研究员，研究方向为近似计算、数字芯片设计

刘伟强：男，教授，研究方向为近似计算、集成电路设计等

通讯作者:
刘伟强　liuweiqiang@nuaa.edu.cn

中图分类号: TN402; TP183
计量
- 文章访问数: 811
- HTML全文浏览量: 437
- PDF下载量: 173
- 被引次数: 0
出版历程
- 收稿日期: 2023-05-04
- 修回日期: 2023-08-23
- 网络出版日期: 2023-08-25
- 刊出日期: 2023-09-27

Application and Research Progress of Approximate Computing as a New Computing Paradigm in AI Acceleration Systems

1.
College of Electronic and Information Engineering/College of Integrated Circuits, Nanjing University of Aeronautics and Astronautics, Nanjing 211106, China
2.
Key Laboratory of Aerospace Integrated Circuits and Microsystem, Ministry of Industry and Information Technology, Nanjing 211106, China

Funds: The National Natural Science Foundation of China (62022041), The Fundamental Research Funds for the Central Universities (NJ2023020)

摘要

摘要: 深度学习已经成为当前人工智能技术中最为重要的算法之一。随着应用场景不断扩展，深度学习硬件规模越来越大，计算复杂度呈现数量级提升趋势，对加速系统提出了极高能效需求。后摩尔时代，新型计算范式逐渐取代工艺微缩成为提升能效的有效方案，近似计算以牺牲部分精度的代价换取大幅能效提升，成为最具前景的设计方法之一。该文以深度学习加速系统的不同设计层次为切入，首先介绍了深度学习网络模型的算法特征，围绕算法层的近似计算方案介绍了量化方法的研究进展；其次，围绕硬件架构和电路层调研了当前深度学习加速在图像、语音等多个方向采用的近似电路和架构方案，围绕层次化的设计方法调研了当前近似计算的系统设计方法学及EDA领域的关键问题和研究进展；最后，对该领域方向进行展望，旨在推动近似计算新范式在深度学习加速系统中的应用。
- 近似计算 /
- 计算范式 /
- 深度学习 /
- 加速系统 /
- 研究进展
Abstract: Deep learning has emerged as one of the most important algorithms in artificial intelligence. With the increasing application scenarios, the hardware scales for deep learning are becoming larger, and the computational complexity has considerably increased, leading to a high demand for energy efficiency in accelerating systems. In the post-Moore’s Law era, new computing paradigms are gradually replacing process scaling as an effective solution for improving energy efficiency. One of the most promising design paradigms is approximate computing, which sacrifices some precision to improve energy efficiency. This research focuses on different design layers of deep learning acceleration systems. First, the algorithm characteristics of deep learning network models are introduced, and the research progress on quantization methods is presented in view of the approximate computing scheme at the algorithm layer. Second, approximate circuits and architectures employed in various directions such as image and speech recognition in the circuit-architecture layer are surveyed. Furthermore, the current hierarchical design methods for approximate computing as well as critical issues and research progress in Electronic Design Automation (EDA) are investigated. Finally, the future direction of this field is anticipated to promote the application of a new paradigm of approximate computing in deep learning acceleration systems.
- Approximate computing /
- Computing paradigm /
- Deep learning /
- Acceleration system /
- Research progress

HTML全文

图 1 近似计算设计流程与评估方法

下载: 全尺寸图片幻灯片

参考文献(92)

[1]	KRIZHEVSKY A, SUTSKEVER I, and HINTON G E. ImageNet classification with deep convolutional neural networks[J]. Communications of the ACM, 2017, 60(6): 84–90. doi: 10.1145/3065386
[2]	SIMONYAN K and ZISSERMAN A. Very deep convolutional networks for large-scale image recognition[C]. Proceedings of the 3rd International Conference on Learning Representations, San Diego, USA, 2015.
[3]	LIN Zichuan, LI Junyou, SHI Jianing, et al. JueWu-MC: Playing minecraft with sample-efficient hierarchical reinforcement learning[C]. Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, Vienna, Austria, 2022.
[4]	LI Zewen, LIU Fan, YANG Wenjie, et al. A survey of convolutional neural networks: Analysis, applications, and prospects[J]. IEEE Transactions on Neural Networks and Learning Systems, 2022, 33(12): 6999–7019. doi: 10.1109/TNNLS.2021.3084827
[5]	REYNOLDS L and MCDONELL K. Prompt programming for large language models: Beyond the few-shot paradigm[C]. Proceedings of 2021 CHI Conference on Human Factors in Computing Systems, Yokohama, Japan, 2021,
[6]	GAWLIKOWSKI J, TASSI C R N, ALI M, et al. A survey of uncertainty in deep neural networks[J]. arXiv: 2107.03342, 2021.
[7]	LECUN Y. 1.1 Deep learning hardware: Past, present, and future[C]. Proceedings of 2019 IEEE International Solid-State Circuits Conference, San Francisco, USA, 2019: 536–543.
[8]	MENGHANI G. Efficient deep learning: A survey on making deep learning models smaller, faster, and better[J]. ACM Computing Surveys, 2023, 55(12): 259. doi: 10.1145/3578938
[9]	CAI Hao, BIAN Zhongjian, HOU Yaoru, et al. 33.4 A 28nm 2Mb STT-MRAM computing-in-memory macro with a refined bit-cell and 22.4–41.5 TOPS/W for AI inference[C]. Proceedings of 2023 IEEE International Solid-State Circuits Conference, San Francisco, USA, 2023.
[10]	GUO An, SI Xin, CHEN Xi, et al. A 28nm 64-kb 31.6-TFLOPS/W digital-domain floating-point-computing-unit and double-bit 6T-SRAM computing-in-memory macro for floating-point CNNs[C]. Proceedings of 2023 IEEE International Solid-State Circuits Conference, San Francisco, USA, 2023.
[11]	WANG Shaowei, XIE Guangjun, HAN Jie, et al. Highly accurate division and square root circuits by exploiting signal correlation in stochastic computing[J]. International Journal of Circuit Theory and Applications, 2022, 50(4): 1375–1385. doi: 10.1002/cta.3219
[12]	XU Wenbing, XIE Guangjun, WANG Shaowei, et al. A stochastic computing architecture for local contrast and mean image thresholding algorithm[J]. International Journal of Circuit Theory and Applications, 2022, 50(9): 3279–3291. doi: 10.1002/cta.3320
[13]	LIU Shanshan, TANG Xiaochen, NIKNIA F, et al. Stochastic dividers for low latency neural networks[J]. IEEE Transactions on Circuits and Systems I:Regular Papers, 2021, 68(10): 4102–4115. doi: 10.1109/TCSI.2021.3103926
[14]	LIU Weiqiang and LOMBARDI F. Approximate Computing[M]. Cham: Springer, 2022.
[15]	LIU Weiqiang, LOMBARDI F, and SCHULTE M. Approximate computing: From circuits to applications[J]. Proceedings of the IEEE, 2020, 108(12): 2103–2107. doi: 10.1109/JPROC.2020.3033361
[16]	REDA S and SHAFIQUE M. Approximate Circuits[M]. Cham: Springer, 2019.
[17]	高越. 深度神经网络中Softmax函数的近似设计与硬件加速[D]. [硕士论文], 南京航空航天大学, 2021. GAO Yue. Approximate design and hardware acceleration of Softmax function for deep neural networks[D]. [Master dissertation], Nanjing University of Aeronautics and Astronautics, 2021.
[18]	闫成刚, 赵轩, 徐宸宇, 等. 基于部分积概率分析的高精度低功耗近似浮点乘法器设计[J]. 电子与信息学报, 2023, 45(1): 87–95. doi: 10.11999/JEIT211485 YAN Chenggang, ZHAO Xuan, XU Chenyu, et al. Design of high precision low power approximate floating-point multiplier based on partial product probability analysis[J]. Journal of Electronics &Information Technology, 2023, 45(1): 87–95. doi: 10.11999/JEIT211485
[19]	黄乐朋. 面向低功耗关键词识别的近似计算模块设计[D]. [硕士论文], 东南大学, 2021. HUANG Lepeng. Design of approximate computing module for low power keyword recognition[D]. [Master dissertation], Southeast University, 2021.
[20]	朱文涛. 面向多噪声场景低功耗关键词识别的可重构架构及电路实现[D]. [硕士论文], 东南大学, 2021. ZHU Wentao. Design and implementation of a reconfigurable architecture for low-power keyword spotting in multiple noise scenarios[D]. [Master dissertation], Southeast University, 2021.
[21]	李焱. 面向二值化权重网络的近似加法单元设计[D]. [硕士论文], 东南大学, 2021. LI Yan. Design of approximate addition unit for binarized weight network[D]. [Master dissertation], Southeast University, 2021.
[22]	张子骥. 基于近似计算的高能效电路设计技术研究[D]. [博士论文], 电子科技大学, 2021. ZHANG Ziji. Research on energy-efficient circuit design technology based on approximate calculation[D]. [Ph. D. dissertation], University of Electronic Science and Technology, 2021.
[23]	裴浩然. 基于近似计算的自适应滤波器设计[D]. [硕士论文], 电子科技大学, 2021. PEI Haoran. Design of adaptive filter based on approximate computing[D]. [Master dissertation], University of Electronic Science and Technology, 2021.
[24]	徐成文. 基于神经网络的近似计算训练算法[D]. [硕士论文], 上海交通大学, 2019. XU Chengwen. Cost-efficient and quality assured approximate computing framework using nerual network[D]. [Master dissertation], Shanghai Jiao Tong University, 2019.
[25]	武翔宇. 一种新型的近似计算训练框架[D]. [硕士论文], 上海交通大学, 2017. WU Xiangyu. A novel quality trade-offs method for approximate acceleration by iterative training[D]. [Master dissertation], Shanghai Jiao Tong University, 2017.
[26]	赵越. 基于近似计算的低功耗乘法器设计与实现[D]. [硕士论文], 上海交通大学, 2020. ZHAO Yue. Design and implementation of low power multiplier based on approximate computing[D]. [Master dissertation], Shanghai Jiao Tong University, 2020.
[27]	季宇, 张悠慧, 郑纬民. 基于忆阻器的近似计算方法[J]. 清华大学学报:自然科学版, 2021, 61(6): 610–617. doi: 10.16511/j.cnki.qhdxxb.2020.22.027 JI Yu, ZHANG Youhui, and ZHENG Weimin. Approximate computing method based on memristors[J]. Journal of Tsinghua University:Science and Technology, 2021, 61(6): 610–617. doi: 10.16511/j.cnki.qhdxxb.2020.22.027
[28]	王智慧. 基于近似计算的高能效JPEG加速器设计[D]. [硕士论文], 清华大学, 2018. WANG Zhihui. An energy efficient JPEG encoder with approximate computing paradigm[D]. [Master dissertation], Tsinghua University, 2018.
[29]	张士长, 王郁杰, 肖航, 等. 支持CNN与LSTM的二值权重神经网络芯片[J]. 高技术通讯, 2021, 31(2): 122–128. doi: 10.3772/j.issn.1002-0470.2021.02.002 ZHANG Shichang, WANG Yujie, XIAO Hang, et al. Binary-weight neural network chip supporting CNN and LSTM[J]. Chinese High Technology Letters, 2021, 31(2): 122–128. doi: 10.3772/j.issn.1002-0470.2021.02.002
[30]	陆维娜, 胡瑜, 叶靖, 等. 面向卷积神经网络加速器吞吐量优化的FPGA自动化设计方法[J]. 计算机辅助设计与图形学学报, 2018, 30(11): 2164–2173. doi: 10.3724/SP.J.1089.2018.17039 LU Weina, HU Yu, YE Jing, et al. Throughput-oriented automatic design of FPGA accelerator for convolutional neural networks[J]. Journal of Computer-Aided Design &Computer Graphics, 2018, 30(11): 2164–2173. doi: 10.3724/SP.J.1089.2018.17039
[31]	朱新忠, 程利甫, 吴有余, 等. 基于误差模型的权重二值神经网络近似加速[J]. 上海航天(中英文), 2021, 38(4): 25–30. doi: 10.19328/j.cnki.2096-8655.2021.04.004 ZHU Xinzhong, CHENG Lifu, WU Youyu, et al. Error model based approximate computing design for binarized weight neural network system[J]. Aerospace Shanghai (Chinese &English), 2021, 38(4): 25–30. doi: 10.19328/j.cnki.2096-8655.2021.04.004
[32]	LIANG Tailin, GLOSSNER J, WANG Lei, et al. Pruning and quantization for deep neural network acceleration: A survey[J]. Neurocomputing, 2021, 461: 370–403. doi: 10.1016/j.neucom.2021.07.045
[33]	BASKIN C, LISS N, SCHWARTZ E, et al. UNIQ: Uniform noise injection for non-uniform quantization of neural networks[J]. ACM Transactions on Computer Systems, 2019, 37(1/4): 4. doi: 10.1145/3444943
[34]	BANNER R, NAHSHAN Y, and SOUDRY D. Post training 4-bit quantization of convolutional networks for rapid-deployment[C]. Proceedings of the 33rd International Conference on Neural Information Processing Systems, Vancouver, Canada, 2019.
[35]	VOGEL S, SPRINGER J, GUNTORO A, et al. Self-supervised quantization of pre-trained neural networks for multiplierless acceleration[C]. Proceedings of 2019 Design, Automation & Test in Europe Conference & Exhibition, Florence, Italy, 2019: 1094–1099.
[36]	GYSEL P, PIMENTEL J, MOTAMEDI M, et al. Ristretto: A framework for empirical study of resource-efficient inference in convolutional neural networks[J]. IEEE Transactions on Neural Networks and Learning Systems, 2018, 29(11): 5784–5789. doi: 10.1109/TNNLS.2018.2808319
[37]	NAGEL M, FOURNARAKIS M, AMJAD R A, et al. A white paper on neural network quantization[J]. arXiv: 2106.08295, 2021.
[38]	JUNG S, SON C, LEE S, et al. Learning to quantize deep networks by optimizing quantization intervals with task loss[C]. Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, USA, 2019: 4345–4354.
[39]	CHOI J, VENKATARAMANI S, SRINIVASAN V, et al. Accurate and efficient 2-bit quantized neural networks[C]. Proceedings of Machine Learning and Systems 2019, Stanford, USA, 2019.
[40]	ZHOU Shuchang, WU Yuxin, NI Zekun, et al. DoReFa-Net: Training low bitwidth convolutional neural networks with low bitwidth gradients[J]. arXiv: 1606.06160, 2018.
[41]	RASTEGARI M, ORDONEZ V, REDMON J, et al. XNOR-Net: ImageNet classification using binary convolutional neural networks[C]. Proceedings of the 14th European Conference on Computer Vision, Amsterdam, The Netherlands, 2016: 525–542.
[42]	ANDRI R, CAVIGELLI L, ROSSI D, et al. YodaNN: An architecture for ultralow power binary-weight CNN acceleration[J]. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2018, 37(1): 48–60. doi: 10.1109/TCAD.2017.2682138
[43]	GONG Yu, CAI Hao, WU Haige, et al. Quality driven systematic approximation for binary-weight neural network deployment[J]. IEEE Transactions on Circuits and Systems I:Regular Papers, 2022, 69(7): 2928–2940. doi: 10.1109/TCSI.2022.3164170
[44]	SHAFIQUE M, AHMAD W, HAFIZ R, et al. A low latency generic accuracy configurable adder[C]. Proceedings of the 52nd Annual Design Automation Conference, San Francisco, USA, 2015: 86.
[45]	HANIF M A, HAFIZ R, HASAN O, et al. QuAd: Design and analysis of quality-area optimal low-latency approximate adders[C]. Proceedings of the 54th Annual Design Automation Conference, Austin, USA, 2017: 42.
[46]	ZHU Ning, GOH W L, ZHANG W, et al. Design of low-power high-speed truncation-error-tolerant adder and its application in digital signal processing[J]. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 2010, 18(8): 1225–1229. doi: 10.1109/TVLSI.2009.2020591
[47]	ZHU Ning, GOH W L, and YEO K S. An enhanced low-power high-speed adder for error-tolerant application[C]. Proceedings of the 2009 12th International Symposium on Integrated Circuits, Singapore, 2009: 69–72.
[48]	GUPTA V, MOHAPATRA D, RAGHUNATHAN A, et al. Low-power digital signal processing using approximate adders[J]. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2013, 32(1): 124–137. doi: 10.1109/TCAD.2012.2217962
[49]	CAMUS V, SCHLACHTER J, and ENZ C. Energy-efficient inexact speculative adder with high performance and accuracy control[C]. Proceedings of 2015 IEEE International Symposium on Circuits and Systems, Lisbon, Portugal, 2015: 45–48.
[50]	CAMUS V, SCHLACHTER J, and ENZ C. A low-power carry cut-back approximate adder with fixed-point implementation and floating-point precision[C]. Proceedings of the 53rd Annual Design Automation Conference, Austin, USA, 2016: 127.
[51]	VERMA A K, BRISK P, and IENNE P. Variable latency speculative addition: A new paradigm for arithmetic circuit design[C]. Proceedings of 2008 Design, Automation and Test in Europe, Munich, Germany, 2008: 1250–1255.
[52]	MORGENSHTEIN A, YUZHANINOV V, KOVSHILOVSKY A, et al. Full-swing gate diffusion input logic-case-study of low-power CLA adder design[J]. Integration, 2014, 47(1): 62–70. doi: 10.1016/j.vlsi.2013.04.002
[53]	HAN Jie and ORSHANSKY M. Approximate computing: An emerging paradigm for energy-efficient design[C]. Proceedings of 2013 18th IEEE European Test Symposium, Avignon, France, 2013: 1–6.
[54]	MAHDIANI H R, AHMADI A, FAKHRAIE S M et al. Bio-inspired imprecise computational blocks for efficient VLSI implementation of soft-computing applications[J]. IEEE Transactions on Circuits and Systems I:Regular Papers, 2010, 57(4): 850–862. doi: 10.1109/TCSI.2009.2027626
[55]	JOHN V, SAM S, RADHA S, et al. Design of a power-efficient Kogge–Stone adder by exploring new OR gate in 45nm CMOS process[J]. Circuit World, 2020, 46(4): 257–269. doi: 10.1108/CW-12-2018-0104
[56]	LIU Bo, WANG Ziyu, WANG Xuetao, et al. An efficient BCNN deployment method using quality-aware approximate computing[J]. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2022, 41(11): 4217–4228. doi: 10.1109/TCAD.2022.3197509
[57]	LIU Weiqiang, QIAN Liangyu, WANG Chenghua, et al. Design of approximate radix-4 booth multipliers for error-tolerant computing[J]. IEEE Transactions on Computers, 2017, 66(8): 1435–1441. doi: 10.1109/TC.2017.2672976
[58]	BORO B, REDDY K M, KUMAR Y B N, et al. Approximate radix-8 Booth multiplier for low power and high speed applications[J]. Microelectronics Journal, 2020, 101: 104816. doi: 10.1016/j.mejo.2020.104816
[59]	WARIS H, WANG Chenghua, and LIU Weiqiang. Hybrid low radix encoding-based approximate booth multipliers[J]. IEEE Transactions on Circuits and Systems II:Express Briefs, 2020, 67(12): 3367–3371. doi: 10.1109/TCSII.2020.2975094
[60]	YIN Shouyi, OUYANG Peng, ZHENG Shixuan, et al. A 141 UW, 2.46 PJ/neuron binarized convolutional neural network based self-learning speech recognition processor in 28NM CMOS[C]. Proceedings of 2018 IEEE Symposium on VLSI Circuits, Honolulu, USA, 2018: 139–140.
[61]	ZHAO Yue, LI Tong, DONG Feng, et al. A new approximate multiplier design for digital signal processing[C]. Proceedings of 2019 IEEE 13th international conference on ASIC, Chongqing China, 2019: 1–4.
[62]	MITCHELL J N. Computer multiplication and division using binary logarithms[J]. IRE Transactions on Electronic Computers, 1962, EC-11(4): 512–517. doi: 10.1109/TEC.1962.5219391
[63]	YIN Peipei, WANG Chenghua, WARIS H, et al. Design and analysis of energy-efficient dynamic range approximate logarithmic multipliers for machine learning[J]. IEEE Transactions on Sustainable Computing, 2021, 6(4): 612–625. doi: 10.1109/TSUSC.2020.3004980
[64]	LIU Weiqiang, XU Jiahua, WANG Danye, et al. Design and evaluation of approximate logarithmic multipliers for low power error-tolerant applications[J]. IEEE Transactions on Circuits and Systems I:Regular Papers, 2018, 65(9): 2856–2868. doi: 10.1109/TCSI.2018.2792902
[65]	SAADAT H, JAVAID H, IGNJATOVIC A, et al. REALM: Reduced-error approximate log-based integer multiplier[C]. Proceedings of 2020 Design, Automation & Test in Europe Conference & Exhibition, Grenoble, France, 2020: 1366–1371.
[66]	MRAZEK V, HRBACEK R, VASICEK Z, et al. EvoApprox8b: Library of approximate adders and multipliers for circuit design and benchmarking of approximation methods[C]. Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, Lausanne, Switzerland, 2017: 258–261.
[67]	VENKATARAMANI S, RANJAN A, ROY K, et al. AxNN: Energy-efficient neuromorphic systems using approximate computing[C]. Proceedings of 2014 IEEE/ACM International Symposium on Low Power Electronics and Design, La Jolla, USA, 2014: 27–32.
[68]	GIRALDO J S P and VERHELST M. Laika: A 5uW programmable LSTM accelerator for always-on keyword spotting in 65nm CMOS[C]. Proceedings of 2018 IEEE 44th European Solid State Circuits Conference, Dresden, Germany, 2018: 166–169.
[69]	PRICE M, GLASS J, and CHANDRAKASAN A P. 14.4 A scalable speech recognizer with deep-neural-network acoustic models and voice-activated power gating[C]. Proceedings of 2017 IEEE International Solid-State Circuits Conference, San Francisco, USA, 2017: 244–245.
[70]	YIN Shouyi, OUYANG Peng, TANG Shibin, et al. A high energy efficient reconfigurable hybrid neural network processor for deep learning applications[J]. IEEE Journal of Solid-State Circuits, 2018, 53(4): 968–982. doi: 10.1109/JSSC.2017.2778281
[71]	YIN Shouyi, OUYANG Peng, YANG Jianxun, et al. An energy-efficient reconfigurable processor for binary-and ternary-weight neural networks with flexible data bit width[J]. IEEE Journal of Solid-State Circuits, 2019, 54(4): 1120–1136. doi: 10.1109/JSSC.2018.2881913
[72]	LU Wenyan, YAN Guihai, and LI Xiaowei. AdaFlow: Aggressive convolutional neural networks approximation by leveraging the input variability[J]. Journal of Low Power Electronics, 2018, 14(4): 481–495. doi: 10.1166/jolpe.2018.1581
[73]	WANG Ying, HE Yintao, CHENG Long, et al. A fast precision tuning solution for always-on DNN accelerators[J]. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2022, 41(5): 1236–1248. doi: 10.1109/TCAD.2021.3089667
[74]	KIM Y D, JEONG W, JUNG L, et al. 2.4 A 7nm high-performance and energy-efficient mobile application processor with tri-cluster CPUs and a sparsity-aware NPU[C]. Proceedings of 2020 IEEE International Solid-State Circuits Conference, San Francisco, USA, 2020: 48–50.
[75]	MAZAHIR S, HASAN O, HAFIZ R, et al. Probabilistic error modeling for approximate adders[J]. IEEE Transactions on Computers, 2017, 66(3): 515–530. doi: 10.1109/TC.2016.2605382
[76]	AYUB M K, HASAN O, and SHAFIQUE M. Statistical error analysis for low power approximate adders[C]. Proceedings of the 54th ACM/EDAC/IEEE Design Automation Conference, Austin, USA, 2017: 1–6.
[77]	ZHU Yiying, LIU Weiqiang, HAN Jie, et al. A probabilistic error model and framework for approximate booth multipliers[C]. Proceedings of 2018 IEEE/ACM International Symposium on Nanoscale Architectures, Athens, Greece, 2018: 1–6.
[78]	LIU Chang, YANG Xinghua, QIAO Fei, et al. Design methodology for approximate accumulator based on statistical error model[C]. Proceedings of the 20th Asia and South Pacific Design Automation Conference, Chiba, Japan, 2015: 237–242.
[79]	VENKATESAN R, AGARWAL A, ROY K, et al. MACACO: Modeling and analysis of circuits for approximate computing[C]. Proceedings of 2011 IEEE/ACM International Conference on Computer-Aided Design, San Jose, USA, 2011: 667–673.
[80]	LIU Zheyu, LI Guihong, QIAO Fei, et al. Concrete: A per-layer configurable framework for evaluating DNN with approximate operators[C]. Proceedings of 2019 IEEE International Conference on Acoustics, Speech and Signal Processing, Brighton UK, 2019: 1552–1556.
[81]	FRENO B A and CARLBERG K T. Machine-learning error models for approximate solutions to parameterized systems of nonlinear equations[J]. Computer Methods in Applied Mechanics and Engineering, 2019, 348: 250–296. doi: 10.1016/j.cma.2019.01.024
[82]	JIANG Weiwen, ZHANG Xinyi, SHA E H M, et al. Accuracy vs. efficiency: Achieving both through FPGA-implementation aware neural architecture search[C]. Proceedings of 2019 56th ACM/IEEE Design Automation Conference, Las Vegas, USA, 2019: 1–6.
[83]	MRAZEK V, HANIF M A, VASICEK Z, et al. autoAx: An automatic design space exploration and circuit building methodology utilizing libraries of approximate components[C]. Proceedings of 2019 56th ACM/IEEE Design Automation Conference, Las Vegas, USA, 2019: 1–6.
[84]	ULLAH S, SAHOO S S, and KUMAR A. CLAppED: A design framework for implementing cross-layer approximation in FPGA-based embedded systems[C]. Proceedings of 2021 58th ACM/IEEE Design Automation Conference, San Francisco, USA, 2021: 475–480.
[85]	YU Ye, LI Yingmin, CHE Shuai, et al. Software-defined design space exploration for an efficient DNN accelerator architecture[J]. IEEE Transactions on Computers, 2021, 70(1): 45–56. doi: 10.1109/TC.2020.2983694
[86]	MEI Linyan, HOUSHMAND P, JAIN V, et al. ZigZag: Enlarging joint architecture-mapping design space exploration for DNN accelerators[J]. IEEE Transactions on Computers, 2021, 70(8): 1160–1174. doi: 10.1109/TC.2021.3059962
[87]	LI Haitong, BHARGAV M, WHATMOUGH P N, et al. On-chip memory technology design space explorations for mobile deep neural network accelerators[C]. Proceedings of 2019 56th ACM/IEEE Design Automation Conference, Las Vegas, USA, 2019: 1–6.
[88]	JIANG Weiwen, LOU Qiuwen, YAN Zheyu, et al. Device-circuit-architecture co-exploration for computing-in-memory neural accelerators[J]. IEEE Transactions on Computers, 2021, 70(4): 595–605. doi: 10.1109/TC.2020.2991575
[89]	VENKATESAN R, SHAO Y S, WANG Miaorong, et al. MAGNet: A modular accelerator generator for neural networks[C]. Proceedings of 2019 IEEE/ACM International Conference on Computer-Aided Design, Westminster, USA, 2019: 1–8.
[90]	LIN Yujun, YANG Mengtian, and HAN Song. NAAS: Neural accelerator architecture search[C]. Proceedings of 2021 58th ACM/IEEE Design Automation Conference, San Francisco, USA, 2021: 1051–1056.
[91]	VENIERIS S I and BOUGANIS C S. fpgaConvNet: Mapping regular and irregular convolutional neural networks on FPGAs[J]. IEEE Transactions on Neural Networks and Learning Systems, 2019, 30(2): 326–342. doi: 10.1109/TNNLS.2018.2844093
[92]	REAGEN B, HERNÁNDEZ-LOBATO J M, ADOLF R, et al. A case for efficient accelerator design space exploration via Bayesian optimization[C]. Proceedings of 2017 IEEE/ACM International Symposium on Low Power Electronics and Design, Taipei, China, 2017: 1–6.