基于部分积概率分析的高精度低功耗近似浮点乘法器设计

闫成刚; 赵轩; 徐宸宇; 陈珂; 葛际鹏; 王成华; 刘伟强

doi:10.11999/JEIT211485

基于部分积概率分析的高精度低功耗近似浮点乘法器设计

doi: 10.11999/JEIT211485 cstr: 32379.14.JEIT211485

南京航空航天大学电子信息工程学院/集成电路学院南京 211106

基金项目: 国家自然科学基金(62101246, 62022041, 62101252)，江苏省自然科学基金(BK20200417)，江苏省双创博士专项资金(2020-30377)

详细信息

作者简介:
闫成刚：男，讲师，研究方向为数模混合集成电路设计、近似通信集成电路设计

赵轩：男，硕士生，研究方向为近似算术运算单元设计、近似FFT设计

徐宸宇：男，硕士生，研究方向为近似计算集成电路设计

陈珂：男，副研究员，研究方向为近似计算集成电路设计

葛际鹏：男，硕士生，研究方向为近似计算集成电路设计

王成华：男，教授，研究方向为信息安全芯片、物理不可克隆函数

刘伟强：男，教授，研究方向为数字集成电路设计、混合信号集成电路设计

通讯作者:
刘伟强　liuweiqiang@nuaa.edu.cn

中图分类号: TN911; TP331.2
计量
- 文章访问数: 1271
- HTML全文浏览量: 1185
- PDF下载量: 170
- 被引次数: 0
出版历程
- 收稿日期: 2021-12-10
- 修回日期: 2022-02-24
- 录用日期: 2022-03-03
- 网络出版日期: 2022-03-08
- 刊出日期: 2023-01-17

Design of High Precision Low Power Approximate Floating-point Multiplier Based on Partial Product Probability Analysis

College of Electronic and Information Engineering/College of Integrated Circuits, Nanjing University of Aeronautics and Astronautics, Nanjing 211106, China

Funds: The National Natural Science Foundation of China (62101246, 62022041, 62101252), The Natural Science Foundation of Jiangsu Province (BK20200417), The Innovative and Entrepreneurial Talents of Jiangsu Province (2020-30377)

摘要

摘要: 浮点乘法器是高动态范围(HDR)图像处理、无线通信等系统中的关键运算单元，其相比于定点乘法器动态范围更广，但复杂度更高。近似计算作为一种新兴范式，在受限的精度损失范围内，可大幅降低硬件资源和功耗开销。该文提出一种16 bit半精度近似浮点乘法器(App-Fp-Mul)，针对浮点乘法器中的尾数乘法模块，根据其部分积阵列中出现1的概率，提出一种对输入顺序不敏感的近似4-2压缩器及低位或门压缩方法，在精度损失较小的条件下有效降低了浮点乘法器资源及功耗。相较于精确设计，所提近似浮点乘法器在归一化平均错误距离(NMED)为0.0014时，面积及功耗延时积方面分别降低20%及58%；相较于现有近似设计，在近似位宽相同时具有更高的精度及更小的功耗延时积。最后将该文所提近似浮点乘法器应用于高动态范围图像处理，相比现有主流方案，峰值信噪比和结构相似性分别达到83.16 dB 和 99.9989%，取得了显著的提升。
- 近似计算 /
- 近似浮点乘法器 /
- 部分积概率分析 /
- 低功耗
Abstract: Floating-point multipliers are the key operational units in High Dynamic Range(HDR) image processing and wireless communication systems. Compared to the fixed-point multipliers, floating-point multipliers have a higher dynamic range and also higher complexity. As a newly emerging paradigm, the hardware resources and power consumption of the circuits can be greatly reduced by approximate computing within an acceptable accuracy loss. According to the probability of 1 in the partial product array, an Approximate Floating-point Multiplier(App-Fp-Mul) is proposed in this paper. An approximate 4-2 compressor and or-gate based compression method are proposed to reduce the resource and power consumption of the floating-point multiplier with small precision loss. Compared with the accurate design, the proposed approximate floating-point multiplier can reduce the area, and power delay product by 20%, and 58% respectively when the Normalized Mean Error Distance (NMED) is 0.0014. And the proposed floating-point multiplier has higher accuracy and a smaller power delay product than the existing approximate designs with the same approximate bit width. Finally, the proposed approximate floating-point multiplier is verified with high dynamic range image processing application. The result of processing can reach 83.16 dB peak signal to noise ratio and 99.9989% structure similarity, which is obviously better than the existing approximate design.
- Approximate computing /
- Approximate Floating-point Multiplier(App-Fp-Mul) /
- Partial product probability analysis /
- Low power consumption

HTML全文

图 1 浮点乘法器结构

下载: 全尺寸图片幻灯片

图 2 传统精确4-2压缩器

下载: 全尺寸图片幻灯片

图 3 全加器

下载: 全尺寸图片幻灯片

图 4 Ahma近似4-2压缩器^[22]

下载: 全尺寸图片幻灯片

图 5 提出的近似4-2压缩器

下载: 全尺寸图片幻灯片

图 6 11×11 部分积阵列

下载: 全尺寸图片幻灯片

图 7 基于概率或门压缩

下载: 全尺寸图片幻灯片

图 8 PDP与NMED图

下载: 全尺寸图片幻灯片

图 9 PDP与MRED图

下载: 全尺寸图片幻灯片

图 10 PDP与MSE图

下载: 全尺寸图片幻灯片

图 11 图像处理结果

下载: 全尺寸图片幻灯片

表 1 浮点数的尾数中1的概率

权重	A[10]	A[9]	A[8]	A[7]	A[6]	A[5]	A[4]	A[3]	A[2]	A[1]	A[0]
高斯分布	0.97	0.42	0.46	0.48	0.49	0.49	0.50	0.50	0.50	0.50	0.50

下载: 导出CSV

表 2 门电路的延时^[24]

	AND	OR	XOR
归一化延时	0.7	0.7	1.0
晶体管数目	2N+2	2N+2	4N+2

下载: 导出CSV

表 3 近似 4-2 压缩器真值表

P1	P2	P3	P4	Sum	Carry	Error
0	0	0	0	0	0	0
0	0	0	1	1	0	0
0	0	1	0	1	0	0
0	0	1	1	0	1	0
0	1	0	0	1	0	0
0	1	0	1	0	1	0
0	1	1	0	0	1	0
0	1	1	1	1	1	0
1	0	0	0	1	0	0
1	0	0	1	0	1	0
1	0	1	0	0	1	0
1	0	1	1	1	1	0
1	1	0	0	0	1	0
1	1	0	1	1	1	0
1	1	1	0	1	1	0
1	1	1	1	0	1	–2

下载: 导出CSV

表 4 或门真值表

P1	P2	Out	Error
0	0	0	0
0	1	1	0
1	0	1	0
1	1	1	–1

下载: 导出CSV

表 5 近似尾数乘法器精度指标

	NMED (10^–3)	MRED (10^–2)	MSE (10⁹)
App-Man-Mul1	2.7121	3.1512	0.3445
App-Man-Mul2^[9]	92.1184	5797.7764	1328.3386
App-Man-Mul3^[10]	5.1695	4.6301	0.7751
App-Man-Mul4^[11]	6.7848	5.7602	1.8995

下载: 导出CSV

表 6 近似浮点乘法器精度指标

	NMED (10^–3)	MRED (10^–2)	MSE (10⁵)
App-Fp-Mul1	1.4094	1.0233	0.4118
App-Fp-Mul2^[9]	7.8399	6.3438	30.8504
App-Fp-Mul3^[10]	2.1112	1.5094	0.6676
App-Fp-Mul4^[11]	3.2446	2.3177	1.8812

下载: 导出CSV

表 7 近似乘法器硬件指标(仿真频率500 MHz)

	Area (μm²)	Power (mW)	Delay (ns)	PDP (pJ)
Ex-Man -Mul	301.644	0.1757	1.86	0.3268
App-Man-Mul1	156.114	0.0568	1.08	0.0613
App-Man-Mul2^[9]	219.996	0.0762	1.11	0.0846
App-Man-Mul3^[10]	222.894	0.0639	1.13	0.0722
App-Man-Mul4^[11]	162.666	0.0532	1.10	0.0585

下载: 导出CSV

表 8 近似浮点乘法器硬件指标(仿真频率200 MHz)

	Area (μm²)	Power (mW)	Delay (ns)	PDP (pJ)
Ex-Fp-Mul	713.1600	0.1562	4.90	0.7654
App-Fp-Mul1	568.7640	0.0779	4.17	0.3249
App-Fp-Mul2^[9]	631.3860	0.0898	4.33	0.3888
App-Fp-Mul3^[10]	633.4020	0.0834	4.21	0.3511
App-Fp-Mul4^[11]	573.1740	0.0775	4.14	0.3209

下载: 导出CSV

表 9 图像处理后的图像的量化指标

	App-Fp-Mul1	App-Fp-Mul2^[9]	App-Fp-Mul3^[10]	App-Fp-Mul4^[11]
PSNR	83.1639	68.1124	54.8700	76.0141
SSIM(%)	99.9989	94.8648	99.4831	99.9949

下载: 导出CSV

参考文献(24)

[1]	LIU Weiqiang, LOMBARDI F, and SHULTE M. A retrospective and prospective view of approximate computing[J]. Proceedings of the IEEE, 2020, 108(3): 394–399. doi: 10.1109/JPROC.2020.2975695
[2]	WILSON L. International technology roadmap for semiconductors (ITRS)[EB/OL]. https://www.semiconductors.org/resources/2013-international-technology-roadmap-for-semiconductors-itrs/, 2013.
[3]	VENKATARAMANI S, CHAKRADHAR ST, ROY K, et al. Computing approximately, and efficiently[C]. 2015 Design, Automation & Test in Europe Conference & Exhibition (DATE), Grenoble, France, 2015: 748–751.
[4]	CHIPPA V K, CHAKRADHAR S T, ROY K, et al. Analysis and characterization of inherent application resilience for approximate computing[C]. The 50th Annual Design Automation Conference (DAC), Austin, USA, 2013: 113.
[5]	LIU Bo, CAI Hao, WANG Zhen, et al. A 22nm, 10.8 $\mu $ W/15.1 $\mu $ W dual computing modes high power-performance-area efficiency domained background noise aware keyword- spotting processor[J]. IEEE Transactions on Circuits and Systems I:Regular Papers, 2020, 67(12): 4733–4746. doi: 10.1109/TCSI.2020.2997913
[6]	LIU Bo, DING Xiaoling, CAI Hao, et al. Precision adaptive MFCC based on R2SDF-FFT and approximate computing for low-power speech keywords recognition[J]. IEEE Circuits and Systems Magazine, 2021, 21(4): 24–39. doi: 10.1109/MCAS.2021.3118175
[7]	WARIS H, WANG Chenghua, LIU Weiqiang, et al. Hybrid low radix encoding-based approximate booth multipliers[J]. IEEE Transactions on Circuits and Systems II:Express Briefs, 2020, 67(12): 3367–3371. doi: 10.1109/TCSII.2020.2975094
[8]	VENKATACHALAM S, ADAMS E, LEE H J, et al. Design and analysis of area and power efficient approximate booth multipliers[J]. IEEE Transactions on Computers, 2019, 68(11): 1697–1703. doi: 10.1109/TC.2019.2926275
[9]	LIU Weiqiang, QIAN Liangyu, WANG Chenghua, et al. Design of approximate radix-4 booth multipliers for error-tolerant computing[J]. IEEE Transactions on Computers, 2017, 66(8): 1435–1441. doi: 10.1109/TC.2017.2672976
[10]	YI Xilin, PEI Haoran, ZHANG Ziji, et al. Design of an energy-efficient approximate compressor for error-resilient multiplications[C]. 2019 IEEE International Symposium on Circuits and Systems (ISCAS), Sapporo, Japan, 2019: 1–5.
[11]	FANG Bao, LIANG Huaguo, XU Dawen, et al. Approximate multipliers based on a novel unbiased approximate 4–2 compressor[J]. Integration, 2021, 81: 17–24. doi: 10.1016/j.vlsi.2021.05.003
[12]	HA M and LEE S. Multipliers with approximate 4–2 compressors and error recovery modules[J]. IEEE Embedded Systems Letters, 2018, 10(1): 6–9. doi: 10.1109/LES.2017.2746084
[13]	AKBARI O, KAMAL M, AFZALI-KUSHA A, et al. Dual-quality 4: 2 compressors for utilizing in dynamic accuracy configurable multipliers[J]. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 2017, 25(4): 1352–1361. doi: 10.1109/TVLSI.2016.2643003
[14]	SABETZADEH F, MOAIYERI M H, and AHMADINEJAD M. A majority-based imprecise multiplier for ultra-efficient approximate image multiplication[J]. IEEE Transactions on Circuits and Systems I:Regular Papers, 2019, 66(11): 4200–4208. doi: 10.1109/TCSI.2019.2918241
[15]	PEI Haoran, YI Xilin, ZHOU Hang, et al. Design of ultra-low power consumption approximate 4–2 compressors based on the compensation characteristic[J]. IEEE Transactions on Circuits and Systems II:Express Briefs, 2021, 68(1): 461–465. doi: 10.1109/TCSII.2020.3004929
[16]	NIU Zijing, JIANG Honglan, ANSARI M S, et al. A logarithmic floating-point multiplier for the efficient training of neural networks[C]. The 2021 on Great Lakes Symposium on VLSI, New York, USA, 2021: 65–70.
[17]	JHA C K, WALIA S, KANOJIA G, et al. FPCAM: Floating point configurable approximate multiplier for error resilient applications[C]. 2021 IEEE International Symposium on Circuits and Systems (ISCAS), Daegu, Korea, 2021: 1–5.
[18]	YIN Peipei, WANG Chenghua, LIU Weiqiang, et al. Design and performance evaluation of approximate floating-point multipliers[C]. 2016 IEEE Computer Society Annual Symposium on VLSI (ISVLSI), Pittsburgh, USA, 2016: 296–301.
[19]	IEEE. IEEE Std 754–2008 IEEE standard for floating-point arithmetic[S]. IEEE, 2008.
[20]	TONG J Y F, NAGLE D, and RUTENBAR R A. Reducing power by optimizing the necessary precision/range of floating-point arithmetic[J]. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 2000, 8(3): 273–286. doi: 10.1109/92.845894
[21]	HSIAO S F, JIANG M R, YEH J S. Design of high-speed low-power 3–2 counter and 4–2 compressor for fast multipliers[J]. Electronics Letters, 1998, 34(4): 341–343. doi: 10.1049/el:19980306
[22]	AHMADINEJAD M, MOAIYERI M H, and SABETZADEH F. Energy and area efficient imprecise compressors for approximate multiplication at nanoscale[J]. AEU- International Journal of Electronics and Communications, 2019, 110: 152859. doi: 10.1016/j.aeue.2019.152859
[23]	STROLLO A G M, NAPOLI E, DE CARO D, et al. Comparison and extension of approximate 4–2 compressors for low-power approximate multipliers[J]. IEEE Transactions on Circuits and Systems I:Regular Papers, 2020, 67(9): 3021–3034. doi: 10.1109/TCSI.2020.2988353
[24]	朱玉莹. 优化的近似Booth乘法器设计和评估及概率错误模型分析[D]. [硕士论文], 南京航空航天大学, 2020: 13–15. ZHU Yuying. Design and evaluation of improved approximate Booth multipliers and probabilistic error model analysis[D]. [Master dissertation], Nanjing University of Aeronautics and Astronautics, 2020: 13–15.