面向数字信号处理领域的近似计算技术应用与研究进展

王旭; 陈珂; 闫成刚; 王成华; 刘伟强

doi:10.11999/JEIT231245

面向数字信号处理领域的近似计算技术应用与研究进展

doi: 10.11999/JEIT231245 cstr: 32379.14.JEIT231245

1.
南京航空航天大学集成电路学院南京 211106
2.
空天集成电路与微系统工信部重点实验室南京 211106

基金项目: 国家重点研发计划青年科学家项目(2022YFB4500200)，国家自然科学基金(62101252, 62022041)

详细信息

作者简介:
王旭：女，博士生，研究方向为近似计算

陈珂：男，副研究员，研究方向为近似计算电路设计

闫成刚：男，副研究员，研究方向为混合信号集成电路设计

王成华：男，教授，研究方向为集成电路设计、验证与测试

刘伟强：男，教授，研究方向为高能效高安全性新兴计算集成电路与系统

通讯作者:
刘伟强　liuweiqiang@nuaa.edu.cn

中图分类号: TN402; TP183
计量
- 文章访问数: 1199
- HTML全文浏览量: 1043
- PDF下载量: 122
- 被引次数: 0
出版历程
- 收稿日期: 2023-11-09
- 修回日期: 2024-03-29
- 网络出版日期: 2024-05-07
- 刊出日期: 2024-05-30

Progress in The Application and Research of Approximate Computation Techniques Oriented to The Field of Digital Signal Processing

1.
College of Integrated Circuits, Nanjing University of Aeronautics and Astronautics, Nanjing 211106, China
2.
Key Laboratory of Aerospace Integrated Circuits and Microsystem, Ministry of Industry and Information Technology, Nanjing 211106, China

Funds: The National Key Research and Development Program of China (2022YFB4500200), The National Natural Science Foundation of China (62101252, 62022041)

摘要

摘要: 在信号处理领域，近似计算技术备受关注。复杂算法和海量数据限制了应用的处理速度且增加了系统硬件消耗。由于信号具有冗余性，精确结果并非必需，满足用户可接受的结果已足够。因此，采用近似计算技术可以有效减少计算量，提高计算效率和系统性能。该文以近似计算技术的不同设计层次为切入，首先介绍了信号处理应用的特点，综述了近年来近似计算技术在算法和电路层面的研究进展，并调研了通信、视频图像以及雷达等信号处理方向的近似计算技术方案。最后，对该领域的发展方向进行了讨论和展望，为推动近似计算技术在信号处理领域的应用提供了思路。
- 近似计算 /
- 新兴计算范式 /
- 数字信号处理 /
- 研究进展
Abstract: In the field of signal processing, approximate computing techniques have garnered significant attention. Complex algorithms and massive data impose limitations on processing speed and increase system hardware consumption. Since signals often contain redundancy, precise results are not always necessary, and achieving results acceptable to users is sufficient. Therefore, employing approximate computing techniques can effectively reduce computational complexity, enhance computational efficiency, and improve system performance. This paper takes a hierarchical approach to the design of approximate computing techniques. It first introduces the characteristics of signal processing applications, reviews recent research progress in approximate computing techniques at the algorithm and circuit levels, and investigates approximate computing solutions in signal processing directions such as communication, video imaging, and radar. Finally, it discusses and prospects the development direction of this field, providing insights to promote the application of approximate computing techniques in signal processing.
- Approximate computing /
- Emerging computing paradigm /
- Digital signal processing /
- Research progress

HTML全文

图 1 近似计算技术及其应用

下载: 全尺寸图片幻灯片

表 1 加法器近似技术

近似方法	相关工作	概述
非分段推测近似	文献[6]	通过缩短进位链来获得更快速和更高性能的加法器。
分段推测近似	文献[7–9]	通过缩短进位链来获得更快速和更高性能的加法器。
晶体管级近似	文献[10–12]	通过减少晶体管和基本门的数量，显著降低功耗。

下载: 导出CSV

表 2 定点乘法器近似技术

近似方法	相关工作	概述
操作数近似	文献[14]	通过将二进制乘法转换为对数域的加法，具有极低的功耗。
阵列近似	文献[15]	通过对部分积矩阵的一部分低有效位直接舍去，以达到调整输出位宽和降低乘法器功耗面积的目的。
部分积近似	文献[16–18]	利用卡诺图对Booth编码结果进行优化，从而精简 Booth 算法的部分积表达式。
压缩器近似	文献[19–21]	通过切断压缩器同级之间进位链使得乘法器在功耗、延迟和晶体管数量方面都实现了显著降低。

下载: 导出CSV

表 3 除法器近似技术

近似方法	相关工作	概述
阵列近似	文献[26, 27]	对传统阵列结构中的减法器进行近似设计，降低除法器阵列的复杂度。
操作数近似	文献[28, 29]	对操作数进行截断，或者从首位 1 开始截断，在较小精度损失下大大减小了运算延时和能量。
阵列和操作数混合近似	文献[30]	利用卡诺图对 Booth编码结果进行优化，从而精简 Booth 算法的部分积表达式。

下载: 导出CSV

表 4 乘法累加器近似技术

近似方法	相关工作	概述
乘法器近似	文献[31, 32]	通过将乘法操作数进行分段并设计近似 MAC。
加法器近似	文献[33, 34]	通过将近似加法器运用到有符号 MAC 单元的最后一步进位传播加法器中，同时使用 VOS 调节输入电压，最终实现功耗降低。
乘法加法阵列合并近似	文献[35–37]	通过将累加插入到乘法部分积阵列中，设计近似MAC 单元。

下载: 导出CSV

参考文献(60)

[1]	LIU Weiqiang and LOMBARDI F. Approximate Computing[M]. Cham: Springer, 2022: 365–368. doi: 10.1007/978-3-030-98347-5.
[2]	LIU Weiqiang, LOMBARDI F, and SCHULTE M. Approximate computing: From circuits to applications[J]. Proceedings of the IEEE, 2020, 108(12): 2103–2107. doi: 10.1109/JPROC.2020.3033361.
[3]	CHIPPA V K, MOHAPATRA D, ROY K, et al. Scalable effort hardware design[J]. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 2014, 22(9): 2004–2016. doi: 10.1109/TVLSI.2013.2276759.
[4]	NOWICK S M. Design of a low-latency asynchronous adder using speculative completion[J]. IEE Proceedings - Computers and Digital Techniques, 1996, 143(5): 301–307. doi: 10.1049/ip-cdt:19960704.
[5]	LU S L. Speeding up processing with approximation circuits[J]. Computer, 2004, 37(3): 67–73. doi: 10.1109/MC.2004.1274006.
[6]	ESPOSITO D, DE CARO D, NAPOLI E, et al. Variable latency speculative Han-Carlson adder[J]. IEEE Transactions on Circuits and Systems I: Regular Papers, 2015, 62(5): 1353–1361. doi: 10.1109/TCSI.2015.2403036.
[7]	SEOK H, SEO H, LEE J, et al. A novel efficient approximate adder design using single input pair based computation[C]. 2022 19th International SoC Design Conference (ISOCC), Gangneung-si, Korea, 2022: 57–58. doi: 10.1109/ISOCC56007.2022.10031341.
[8]	SEO H and KIM Y. A low latency approximate adder design based on dual sub-adders with error recovery[J]. IEEE Transactions on Emerging Topics in Computing, 2023, 11(3): 811–816. doi: 10.1109/TETC.2023.3270963.
[9]	MANOHAR P S, ROHAN B, RAMANA P V S, et al. Implementation of carry look Ahead adder with 2-bit approximate adder[C]. 2023 2nd International Conference on Applied Artificial Intelligence and Computing (ICAAIC), Salem, India, 2023: 1543–1547. doi: 10.1109/ICAAIC56838.2023.10140683.
[10]	YAN Aibin, WEI Shaojie, LI Zhixing, et al. Design of low-cost approximate CMOS full adders[C]. 2023 IEEE International Symposium on Circuits and Systems (ISCAS), Monterey, USA, 2023: 1–5. doi: 10.1109/ISCAS46773.2023.10181531.
[11]	LAGIDI P, ISWARYA A, RAJESH G, et al. Design of 16-bit and 32-bit approximate full adder using majority logic[C]. 2021 2nd Global Conference for Advancement in Technology (GCAT), Bangalore, India, 2021: 1–5. doi: 10.1109/GCAT52182.2021.9587782.
[12]	LIU Bo, XUE Anfeng, WANG Ziyu, et al. A reconfigurable approximate computing architecture with dual-VDD for low-power Binarized weight network deployment[J]. IEEE Transactions on Circuits and Systems II: Express Briefs, 2023, 70(1): 291–295. doi: 10.1109/TCSII.2022.3196045.
[13]	MITCHELL J N. Computer multiplication and division using binary logarithms[J]. IRE Transactions on Electronic Computers, 1962, EC-11(4): 512–517. doi: 10.1109/TEC.1962.5219391.
[14]	KIM M S, DEL BARRIO A A, OLIVEIRA L T, et al. Efficient Mitchell's approximate log multipliers for convolutional neural networks[J]. IEEE Transactions on Computers, 2019, 68(5): 660–675. doi: 10.1109/TC.2018.2880742.
[15]	NUNZIATA I, ZACHARELOS E, SAGGESE G, et al. Approximate recursive multipliers using carry truncation and error compensation[C]. 2022 17th Conference on Ph. D. Research in Microelectronics and Electronics (PRIME), Villasimius, Italy, 2022: 137–140. doi: 10.1109/PRIME55000.2022.9816787.
[16]	WARIS H, WANG Chenghua, LIU Weiqiang, et al. Hybrid partial product-based high-performance approximate recursive multipliers[J]. IEEE Transactions on Emerging Topics in Computing, 2022, 10(1): 507–513. doi: 10.1109/TETC.2020.3013977.
[17]	SHANKAR R G and ANANTHI D R. Approximate booth multipliers using compressors and counter[C]. 2023 International Conference on Inventive Computation Technologies (ICICT), Lalitpur, Nepal, 2023: 1658–1662. doi: 10.1109/ICICT57646.2023.10134198.
[18]	LIU Bo, CAI Hao, ZHANG Zilong, et al. Multiplication circuit architecture for error- tolerant CNN-based keywords speech recognition[J]. IEEE Design & Test, 2023, 40(3): 26–35. doi: 10.1109/MDAT.2021.3135346.
[19]	SAYADI L, TIMARCHI S, and SHEIKH-AKBARI A. Two efficient approximate unsigned multipliers by developing new configuration for approximate 4: 2 compressors[J]. IEEE Transactions on Circuits and Systems I: Regular Papers, 2023, 70(4): 1649–1659. doi: 10.1109/TCSI.2023.3242558.
[20]	ZHANG Mingtao, NISHIZAWA S, and KIMURA S. Area efficient approximate 4-2 compressor and probability-based error adjustment for approximate multiplier[J]. IEEE Transactions on Circuits and Systems II: Express Briefs, 2023, 70(5): 1714–1718. doi: 10.1109/TCSII.2023.3257852.
[21]	XIE Na, ZHANG Renyuan, YAN Han, et al. Compressors evolution based high speed and energy efficient approximate signed multiplier[C]. 2022 IEEE 16th International Conference on Solid-State & Integrated Circuit Technology (ICSICT), Nanjing, China, 2022: 1–3. doi: 10.1109/ICSICT55466.2022.9963435.
[22]	TONG J Y F, NAGLE D, and RUTENBAR R A. Reducing power by optimizing the necessary precision/range of floating-point arithmetic[J]. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 2000, 8(3): 273–286. doi: 10.1109/92.845894.
[23]	EILERT J, EHLIAR A, and LIU Dake. Using low precision floating point numbers to reduce memory cost for MP3 decoding[C]. 2004 IEEE 6th Workshop on Multimedia Signal Processing, Siena, Italy, 2004: 119–122. doi: 10.1109/MMSP.2004.1436435.
[24]	ZHANG Hang, PUTIC M, and LACH J. Low power GPGPU computation with imprecise hardware[C]. 51st ACM/EDAC/IEEE Design Automation Conference (DAC), San Francisco, USA, 2014: 1–6. doi: 10.1109/dac.2014.6881426.
[25]	YIN Peipei, WANG Chenghua, LIU Weiqiang, et al. Design and performance evaluation of approximate floating-point multipliers[C]. 2016 IEEE Computer Society Annual Symposium on VLSI (ISVLSI), Pittsburgh, USA, 2016: 296–301. doi: 10.1109/ISVLSI.2016.15.
[26]	SAVIO M M D, DEEPA T, DHARSHINI P D, et al. Design and implementation of approximate divider for error-resilient image processing applications[C]. 2023 Second International Conference on Electrical, Electronics, Information and Communication Technologies (ICEEICT), Trichirappalli, India, 2023: 1–5. doi: 10.1109/ICEEICT56924.2023.10157050.
[27]	SHRIRAM A, TIWARI A, ANIL KUMAR U, et al. Power efficient approximate divider architecture for error resilient applications[C]. 2022 IEEE 6th Conference on Information and Communication Technology (CICT), Gwalior, India, 2022: 1–6. doi: 10.1109/CICT56698.2022.9997960.
[28]	WU Yong, JIANG Honglan, MA Zining, et al. An energy-efficient approximate divider based on logarithmic conversion and piecewise constant approximation[J]. IEEE Transactions on Circuits and Systems I: Regular Papers, 2022, 69(7): 2655–2668. doi: 10.1109/TCSI.2022.3167894.
[29]	SAADAT H, JAVAID H, and PARAMESWARAN S. Approximate integer and floating-point dividers with near-zero error bias[C]. 2019 56th ACM/IEEE Design Automation Conference (DAC), Las Vegas, USA, 2019: 1–6. doi: 10.1145/3316781.3317773.
[30]	LIU Weiqiang, XU Tao, LI Jing, et al. Design of unsigned approximate hybrid dividers based on restoring array and logarithmic dividers[J]. IEEE Transactions on Emerging Topics in Computing, 2022, 10(1): 339–350. doi: 10.1109/TETC.2020.3022290.
[31]	WUERDIG R N, SARTORI M L L, ABREU B A, et al. Mitigating asynchronous QDI drawbacks on MAC operators with approximate multipliers[C]. 2022 IEEE International Symposium on Circuits and Systems (ISCAS), Austin, USA, 2022: 1269–1273. doi: 10.1109/ISCAS48785.2022.9937420.
[32]	MISHRA V, PANDEY D, SINGH S, et al. ART-MAC: Approximate rounding and truncation based MAC unit for fault-tolerant applications[C]. 2022 IEEE International Symposium on Circuits and Systems (ISCAS), Austin, USA, 2022: 1640–1644. doi: 10.1109/ISCAS48785.2022.9937437.
[33]	ESPOSITO D, DE CARO D, NAPOLI E, et al. On the use of approximate adders in carry-save multiplier-accumulators[C]. 2017 IEEE International Symposium on Circuits and Systems (ISCAS), Baltimore, USA, 2017: 1–4. doi: 10.1109/ISCAS.2017.8050437.
[34]	WANG Ziyu, WEI Qingwen, XUE Anfeng, et al. Low-power computing unit based on heterogeneous approximate structure for binary convolutional neural network[C]. 2022 IEEE 16th International Conference on Solid-State & Integrated Circuit Technology (ICSICT), Nanjing, China, 2022: 1–3. doi: 10.1109/ICSICT55466.2022.9963452.
[35]	MEO G D, SAGGESE G, STROLLO A G M, et al. Approximate MAC unit using static segmentation[J]. IEEE Transactions on Emerging Topics in Computing, 2023. doi: 10.1109/TETC.2023.3315301.
[36]	LIU Bo, ZHANG Zilong, CAI Hao, et al. Self-compensation tensor multiplication unit for adaptive approximate computing in low-power CNN processing[J]. Science China Information Sciences, 2022, 65(4): 149403. doi: 10.1007/s11432-021-3242-6.
[37]	LIU Bo, ZHANG Renyuan, SHEN Qiao, et al. W-AMA: Weight-aware approximate multiplication architecture for neural processing[J]. Computers and Electrical Engineering, 2023, 111: 108921. doi: 10.1016/j.compeleceng.2023.108921.
[38]	JIANG Honglan, LIU Leibo, JONKER P P, et al. A high-performance and energy-efficient FIR adaptive filter using approximate distributed arithmetic circuits[J]. IEEE Transactions on Circuits and Systems I: Regular Papers, 2019, 66(1): 313–326. doi: 10.1109/TCSI.2018.2856513.
[39]	ESPOSITO D, DI MEO G, DE CARO D, et al. Quality-scalable approximate LMS filter[C]. 2018 25th IEEE International Conference on Electronics, Circuits and Systems (ICECS), Bordeaux, France, 2018: 849–852. doi: 10.1109/ICECS.2018.8617858.
[40]	DI MEO G, DE CARO D, PETRA N, et al. A novel low-power DLMS adaptive filter with sign-magnitude learning and approximated FIR section[C]. 2022 17th Conference on Ph. D. Research in Microelectronics and Electronics (PRIME), Villasimius, Italy, 2022: 217–220. doi: 10.1109/PRIME55000.2022.9816770.
[41]	MONTEIRO M, SEIDEL I, GRELLERT M, et al. Exploring the impacts of multiple kernel sizes of Gaussian filters combined to approximate computing in canny edge detection[C]. 2022 IEEE 13th Latin America Symposium on Circuits and System (LASCAS), Puerto Varas, Chile, 2022: 1–4. doi: 10.1109/LASCAS53948.2022.9789080.
[42]	BERGLAND G. Fast Fourier transform hardware implementations-An overview[J]. IEEE Transactions on Audio and Electroacoustics, 1969, 17(2): 104–108. doi: 10.1109/TAU.1969.1162041.
[43]	ELANGO K and MUNIANDI K. VLSI implementation of an area and energy efficient FFT/IFFT core for MIMO-OFDM applications[J]. Annals of Telecommunications, 2020, 75(5/6): 215–227. doi: 10.1007/s12243-019-00742-6.
[44]	LIU Bo, DING Xiaoling, CAI Hao, et al. Precision adaptive MFCC based on R2SDF-FFT and approximate computing for low-power speech keywords recognition[J]. IEEE Circuits and Systems Magazine, 2021, 21(4): 24–39. doi: 10.1109/MCAS.2021.3118175.
[45]	LIU Weiqiang, LIAO Qicong, QIAO Fei, et al. Approximate designs for Fast Fourier Transform (FFT) with application to speech recognition[J]. IEEE Transactions on Circuits and Systems I: Regular Papers, 2019, 66(12): 4727–4739. doi: 10.1109/TCSI.2019.2933321.
[46]	CHEN W H, SMITH C, and FRALICK S. A fast computational algorithm for the discrete cosine transform[J]. IEEE Transactions on Communications, 1977, 25(9): 1004–1009. doi: 10.1109/TCOM.1977.1093941.
[47]	POTLURI U S, MADANAYAKE A, CINTRA R J, et al. Improved 8-point approximate DCT for image and video compression requiring only 14 additions[J]. IEEE Transactions on Circuits and Systems I: Regular Papers, 2014, 61(6): 1727–1740. doi: 10.1109/TCSI.2013.2295022.
[48]	DA SILVEIRA T L T, CANTERLE D R, COELHO D F G, et al. A class of low-complexity DCT-like transforms for image and video coding[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2022, 32(7): 4364–4375. doi: 10.1109/TCSVT.2021.3134054.
[49]	XING Yan, ZHANG Ziji, QIAN Yiduan, et al. An energy-efficient approximate DCT for wireless capsule endoscopy application[C]. 2018 IEEE International Symposium on Circuits and Systems (ISCAS), Florence, Italy, 2018: 1–4. doi: 10.1109/ISCAS.2018.8351769.
[50]	CAI Lulin, QIAN Yiduan, HE Yajuan, et al. Design of approximate multiplierless DCT with CSD encoding for image processing[C]. 2021 IEEE International Symposium on Circuits and Systems (ISCAS), Daegu, Korea, 2021: 1–4. doi: 10.1109/ISCAS51556.2021.9401200.
[51]	JANHUNEN J, PITKANEN T, SILVEN O, et al. Fixed- and floating-point processor comparison for MIMO-OFDM detector[J]. IEEE Journal of Selected Topics in Signal Processing, 2011, 5(8): 1588–1598. doi: 10.1109/JSTSP.2011.2165830.
[52]	AMIN-NEJAD S, BASHARKHAH K, and GASHTEROODKHANI T A. Floating point versus fixed point tradeoffs in FPGA implementations of QR decomposition algorithm[J]. European Journal of Electrical Engineering and Computer Science, 2019, 3(5). doi: 10.24018/EJECE.2019.3.5.127.
[53]	HU Yao and KOIBUCHI M. Accelerating MPI communication using floating-point compression on lossy interconnection networks[C]. 2021 IEEE 46th Conference on Local Computer Networks (LCN), Edmonton, Canada, 2021: 355–358. doi: 10.1109/LCN52139.2021.9524942.
[54]	PARK J, CHOI J H, and ROY K. Dynamic bit-width adaptation in DCT: An approach to trade off image quality and computation energy[J]. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 2010, 18(5): 787–793. doi: 10.1109/TVLSI.2009.2016839.
[55]	SNIGDHA F S, SENGUPTA D, HU Jiang, et al. Optimal design of JPEG hardware under the approximate computing paradigm[C]. 2016 53rd ACM/EDAC/IEEE Design Automation Conference (DAC), Austin, USA, 2016: 1–6. doi: 10.1145/2897937.2898057.
[56]	PU Yu, DE GYVEZ J P, CORPORAAL H, et al. An ultra-low-energy multi-standard JPEG co-processor in 65 nm CMOS with sub/near threshold supply voltage[J]. IEEE Journal of Solid-State Circuits, 2010, 45(3): 668–680. doi: 10.1109/JSSC.2009.2039684.
[57]	FANG Jian, XU Zongben, ZHANG Bingchen, et al. Fast compressed sensing SAR imaging based on approximated observation[J]. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2014, 7(1): 352–363. doi: 10.1109/JSTARS.2013.2263309.
[58]	JIANG Chenglong, ZHANG Bingchen, FANG Jian, et al. Efficient ℓ_q regularisation algorithm with range-azimuth decoupled for SAR imaging[J]. Electronics Letters, 2014, 50(3): 204–205. doi: 10.1049/el.2013.1989.
[59]	LI Bo, LIU Falin, ZHOU Chongbin, et al. Mixed sparse representation for approximated observation-based compressed sensing radar imaging[J]. Journal of Applied Remote Sensing, 2018, 12(3): 035015. doi: 10.1117/1.JRS.12.035015.
[60]	LI Bo, LIU Falin, ZHOU Chongbin, et al. Phase error correction for approximated observation-based compressed sensing radar imaging[J]. Sensors, 2017, 17(3): 613. doi: 10.3390/s17030613.