一种自适应图像插值算法及加速引擎的协同设计

严忻恺; 丁晟

doi:10.11999/JEIT221503

一种自适应图像插值算法及加速引擎的协同设计

doi: 10.11999/JEIT221503

严忻恺^{1, 2, ,},
丁晟²

1.
浙江大学杭州 310015
2.
江苏省专用集成电路设计重点实验室(无锡) 无锡 214153

基金项目: 江苏省高等学校自然科学研究项目(19KJB510027)，江苏省“333工程”科研资助项目(BRA2020318)，江苏省专用集成电路设计重点实验室开放基金(2020KLOP005)

详细信息

作者简介:
严忻恺：男，讲师，博士生，研究方向为智能图形芯片设计等

丁晟：男，副教授，博士，研究方向FPGA设计等

通讯作者:
严忻恺　yanxinkai@zju.edu.cn

中图分类号: TN492
计量
- 文章访问数: 550
- HTML全文浏览量: 461
- PDF下载量: 109
- 被引次数: 0
出版历程
- 收稿日期: 2022-12-02
- 修回日期: 2023-04-12
- 网络出版日期: 2023-04-19
- 刊出日期: 2023-09-27

Adaptive Image Interpolation Algorithm and Acceleration Engine Co-Design

YAN Xinkai^{1, 2
, ,},
DING Sheng²

1.
Zhejiang University, Hangzhou 310015, China
2.
Jiangsu Key Laboratory Of Asic Design (Wuxi), Wuxi 214153, China

Funds: The Natural Science Foundation of the Jiangsu Higher Education Institutions of China (19KJB510027), Jiangsu “333” Scientific Research Project (BRA2020318), The Development Fundation of Jiangsu Key Laboratory of Asic Design (2020KLOP005)

摘要

摘要: 为提高高清彩色图像超分辨率重建效果，该文提出了一种基于边缘对比度的新型自适应图像插值算法。使用边缘对比度检测和不同尺度的感受野来自适应选择Lanczos插值的系数，自适应性和不同感受野可以进一步提升图像放大质量，图像质量相比于双线性插值平均峰值信噪比(PSNR)提高1.1 dB，结构相似度(SSIM)提高0.025，图像感知相似度(LPIPS)提高0.051，相比于双三次插值平均PSNR提高0.34 dB，SSIM提高0.01，LPIPS提高0.033。同时为减少硬件资源以及提高存储效率协同设计了一种高并行、高能效的加速插值引擎架构，通过两级数据重用和系数脉动机制极大提高计算访存比。加速引擎在16 nm工艺库的综合结果达到2 GHz时钟频率；在Xilinx Zynq Ultra scale+ xczu15eg FPGA上工作频率达到200 MHz，帧速度(fps)达到60的实时性能。
- 插值算法 /
- 自适应 /
- 并行度 /
- 高能效 /
- 加速引擎
Abstract: In order to improve the super-resolution reconstruction effect of the high-definition color image, a new adaptive image interpolation algorithm based on edge contrast is proposed, which chooses adaptively the coefficients of Lanczos interpolation by edge contrast detection and receptive fields with different scales. Adaptability and diverse receptive fields can further improve the quality of image magnification. Compared with the bilinear interpolation algorithm, the Peak Signal to Noise Ratio (PSNR), Structural SIMilarity (SSIM) and Learned Perceptual Image Patch Similarity (LPIPS) are improved by 1.1 dB, 0.025, 0.051, respectively. Compared with the bicubic interpolation algorithm, the PSNR, SSIM and LPIPS are improved by 0.34 dB, 0.01, 0.033, respectively. Moreover, in order to reduce the hardware resources and improve the storage efficiency, a high parallelized and high efficiency accelerated architecture is proposed. A 2-level data reuse and coefficients pulsation mechanism are employed to improve the computation-memory access ratio greatly. The synthesis result of the acceleration engine in the 16nm process library can reach the 2 GHz clock frequency. The operating frequency of FPGA project deployed in Xilinx Zynq Ultra scale+ xczu15eg can reach up to 200 MHz as well, which means that the algorithm can adapt to the frame rate (fps) up to 60.
- Interpolation algorithm /
- Adaptive /
- Parallelism /
- Energy efficiency /
- Acceleration engine

HTML全文

图 1 基于边缘优化的图像插值算法流程图

下载: 全尺寸图片幻灯片

图 2 各级感受野像素的合并插值示意图

下载: 全尺寸图片幻灯片

图 3 加速引擎总体架构示意图

下载: 全尺寸图片幻灯片

图 4 插值引擎总体结构框图

下载: 全尺寸图片幻灯片

图 5 边缘检测单元结构图

下载: 全尺寸图片幻灯片

图 6 插值计算单元结构图

下载: 全尺寸图片幻灯片

表 1 乘加器单元数目

模块名	数量	备注
水平插值计算单元	8×3	int8×int16+int24/ int16×int16+int32
竖直插值计算单元	1×3	int8×int16+int24/ int16×int16+int32

下载: 导出CSV

表 2 加法器单元数目

模块名	数量	位宽
阈值计算单元	24	Int8
梯度计算单元	4	Int8
	8	Int9
	4+4(绝对值)	Int10
	2	Int11
边缘处理单元	4	Int12
近似灰度转换	21	Int8

下载: 导出CSV

表 3 插值引擎的RAM容量表

模块名	数量	容量
插值系数	4	8×4×2B
共计		256B

下载: 导出CSV

表 4 插值引擎的寄存器数目表

模块名	数量	位宽	总数
像素寄存器阵列	88	3B	264 Byte
近似灰度阵列	42	1B	42 Byte
乘累加结果缓存	51	4B+2B	306 Byte
控制+边缘检测			80 Byte
共计			692 Byte

下载: 导出CSV

表 5 不同算法的复杂度对比

算法	时间复杂度	乘法次数	像素点数量
双线性插值	O(n)	6	4
双三次插值	O(n)	20	16
Lanczos3插值	O(n)	42	36
Lanczos4插值	O(n)	72	64
本文算法	O(n)	72	64

下载: 导出CSV

表 6 不同算法的PSNR对比(dB)

算法	平均PSNR	最佳PSNR	最差PSNR
双线性插值	30.88	35.08	23.15
双三次插值	31.59	36.00	23.66
Lanczos3插值	31.82	36.34	23.82
Lanczos4插值	31.84	36.39	23.83
本文算法	31.93	36.42	23.94

下载: 导出CSV

表 7 不同算法的SSIM对比

算法	平均SSIM	最佳SSIM	最差SSIM
双线性插值	0.849	0.920	0.599
双三次插值	0.864	0.936	0.625
Lanczos3插值	0.868	0.942	0.631
Lanczos4插值	0.869	0.944	0.632
本文算法	0.874	0.947	0.650

下载: 导出CSV

表 8 不同算法的LPIPS对比

算法	平均LPIPS	最佳LPIPS	最差LPIPS
双线性插值	0.291	0.133	0.517
双三次插值	0.273	0.104	0.513
Lanczos3插值	0.276	0.098	0.523
Lanczos4插值	0.275	0.096	0.520
本文算法	0.244	0.079	0.479

下载: 导出CSV

表 9 FPGA硬件实现的指标对比

参数名称	文献^[19]	文献^[25]	本文
图像大小	256×256灰度	256×256灰度	960×540彩色
插值算法	BICUBIC	NEDI	本文算法
FPGA平台	Artix-7	virtex-7	Xilinx Zynq
频率(MHz)	289.2	100	200
Slice LUTs	359	4883	19038
Slice Reg	162	2705	6492
DSPs	0	42	27
*说明：FPGA资源为单个插值引擎

下载: 导出CSV

表 10 ASIC硬件实现的指标对比

硬件指标	VLSI’18^[26]	ISSCC’21^[22]	本文
工艺	65 nm	40 nm	16 nm
算法	CNN	插值+预学习	插值
吞吐量(fps)	60	90	60*
数据精度	INT8/INT16	INT8/INT16	INT8/INT16
频率(MHz)	200	200	2000
SRAM(KB)	572	371	2.23
门数量(M)	–	3.11	0.23
*说明：本文实例化4个插值引擎实现的吞吐量(fps)为60

下载: 导出CSV

参考文献(26)

[1]	MEIJERING E H W, ZUIDERVELD K J, and VIERGEVER M A. Image reconstruction by convolution with symmetrical piecewise nth-order polynomial kernels[J]. IEEE Transactions on Image Processing, 1999, 8(2): 192–201. doi: 10.1109/83.743854
[2]	KEYS R G. Cubic convolution interpolation for digital image processing[J]. IEEE Transactions on Acoustics, Speech, and Signal Processing, 1981, 29(6): 1153–1160. doi: 10.1109/TASSP.1981.1163711
[3]	KWOK W and SUN H. Multi-directional interpolation for spatial error concealment[J]. IEEE Transactions on Consumer Electronics, 1993, 39(3): 455–460. doi: 10.1109/30.234620
[4]	LI Xin and ORCHARD M T. New edge-directed interpolation[J]. IEEE Transactions on Image Processing, 2001, 10(10): 1521–1527. doi: 10.1109/83.951537
[5]	CHEN Meijuan, HUANG C H, and LEE W L. A fast edge-oriented algorithm for image interpolation[J]. Image and Vision Computing, 2005, 23(9): 791–798. doi: 10.1016/j.imavis.2005.05.005
[6]	ZHANG Xiangjun and WU Xiaolin. Image interpolation by adaptive 2-D autoregressive modeling and soft-decision estimation[J]. IEEE Transactions on Image Processing, 2008, 28(6): 887–896. doi: 10.1109/TIP.2008.924279
[7]	JAKHETIYA V, KUMAR A, and TIWARI A K. Image interpolation by adaptive 2-D autoregressive modeling[C]. Proceedings of SPIE 7546, Second International Conference on Digital Image Processing, Singapore, 2010.
[8]	LIU Yiwei, JIANG Zhuqing, WANG Yibo, et al. Single-frame reconstruction for improvement of off-axis digital holographic imaging based on image interpolation[J]. Optics Letters, 2020, 45(24): 6623–6626. doi: 10.1364/OL.405578
[9]	WANG Qiang, TANG Xiaoou, and SHUM H. Patch based blind image super resolution[C]. Proceedings of the Tenth IEEE International Conference on Computer Vision, Beijing, China, 2005: 709–716.
[10]	CHAN T M, ZHANG Junping, PU Jian, et al. Neighbor embedding based super-resolution algorithm through edge detection and feature selection[J]. Pattern Recognition Letters, 2009, 30(5): 494–502. doi: 10.1016/j.patrec.2008.11.008
[11]	GAO Xinbo, ZHANG Kaibing, TAO Dacheng, et al. Joint learning for single-image super-resolution via a coupled constraint[J]. IEEE Transactions on Image Processing, 2012, 21(2): 469–480. doi: 10.1109/TIP.2011.2161482
[12]	JI Jiahuan, ZHONG Baojiang, and MA Kaikuang. Image interpolation using multi-scale attention-aware inception network[J]. IEEE Transactions on Image Processing, 2020, 29: 9413–9428. doi: 10.1109/TIP.2020.3026632
[13]	NIU Ben, WEN Weilei, REN Wenqi, et al. Single image super-resolution via a holistic attention network[C]. Proceedings of the 16th European Conference on Computer Vision, Glasgow, UK, 2020: 191–207.
[14]	WEI Pengxu, XIE Ziwei, LU Hannan, et al. Component divide-and-conquer for real-world image super-resolution[C]. Proceedings of the 16th European Conference on Computer Vision, Glasgow, UK, 2020: 101–117.
[15]	DENG Xin, ZHANG Yutong, XU Mai, et al. Deep coupled feedback network for joint exposure fusion and image super-resolution[J]. IEEE Transactions on Image Processing, 2021, 30: 3098–3112. doi: 10.1109/TIP.2021.3058764
[16]	LIN Yuting, LIU Wei, CAI Xiaowen, et al. A CNN-based quality model for image interpolation[C]. Proceedings of 2020 Cross Strait Radio Science & Wireless Technology Conference, Fuzhou, China, 2020: 1–3.
[17]	AMD. AMD FidelityFX super resolution (FSR): Changing the game in just 4 months[EB/OL]. https://www.amd.com/zh-hans/technologies/fidelityfx-super-resolution, 2021.
[18]	Andrew Burnes, nvidia-image-scaler-dlss-rtx-november-2021-updates[EB/OL]. https://www.nvidia.com/en-us/geforce/news/gfecnt/202111/nvidia-image-scaler-dlss-rtx-november-2021-updates/, 2021.
[19]	KHALEDYAN D, AMIRANY A, JAFARI K, et al. Low-cost implementation of bilinear and bicubic image interpolation for real-time image super-resolution[C]. Proceedings of 2020 IEEE Global Humanitarian Technology Conference, Seattle, USA, 2020: 1–5.
[20]	王康, 杨瑞祺, 杨依忠, 等. 基于二阶牛顿插值的图像自适应缩放设计及实现[J]. 计算机应用与软件, 2020, 37(9): 126–132,138. doi: 10.3969/j.issn.1000-386x.2020.09.021 WANG Kang, YANG Ruiqi, YANG Yizhong, et al. Design and implementation of image adaptive scaling based on second order newton interpolation[J]. Computer Applications and Software, 2020, 37(9): 126–132,138. doi: 10.3969/j.issn.1000-386x.2020.09.021
[21]	BOUKHTACHE S, BLAYSAT B, GRÉDIAC M, et al. FPGA-based architecture for bi-cubic interpolation: The best trade-off between precision and hardware resource consumption[J]. Journal of Real-Time Image Processing, 2021, 18(3): 901–911. doi: 10.1007/s11554-020-01035-1
[22]	SHEN H Y, LEE Y C, TONG T W, et al. 4.7 A 91mW 90fps super-resolution processor for full HD images[C]. Proceedings of 2021 IEEE International Solid- State Circuits Conference, San Francisco, USA, 2021: 66–68.
[23]	陆志芳, 钟宝江. 基于预测梯度的图像插值算法[J]. 自动化学报, 2018, 44(6): 1072–1085. doi: 10.16383/j.aas.2017.c160793 LU Zhifang and ZHONG Baojiang. Image interpolation with predicted gradients[J]. Acta Automatica Sinica, 2018, 44(6): 1072–1085. doi: 10.16383/j.aas.2017.c160793
[24]	ZHANG R, ISOLA P, EFROS A A, et al. The unreasonable effectiveness of deep features as a perceptual metric[C]. Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, USA, 2018: 586–595.
[25]	吴世豪, 罗小华, 张建炜, 等. 基于FPGA的新边缘指导插值算法硬件实现[J]. 浙江大学学报:工学版, 2018, 52(11): 2226–2232. doi: 10.3785/j.issn.1008-973X.2018.11.022 WU Shihao, LUO Xiaohua, ZHANG Jianwei, et al. FPGA-based hardware implementation of new edge-directed interpolation algorithm[J]. Journal of Zhejiang University:Engineering Science, 2018, 52(11): 2226–2232. doi: 10.3785/j.issn.1008-973X.2018.11.022
[26]	LEE J, SHIN D, LEE J, et al. A full HD 60 fps CNN super resolution processor with selective caching based layer fusion for mobile devices[C]. Proceedings of 2019 Symposium on VLSI Circuits. Kyoto, Japan, 2019: C302–C303.