FPGA-Based Unified Accelerator for Convolutional Neural Network and Vision Transformer

LI Tianyang; ZHANG Fan; WANG Song; CAO Wei; CHEN Li

doi:10.11999/JEIT230713

Volume 46 Issue 6

Jun. 2024

Turn off MathJax

Article Contents

Article Navigation > Journal of Electronics & Information Technology > 2024 > 46(6): 2663-2672

LI Tianyang, ZHANG Fan, WANG Song, CAO Wei, CHEN Li. FPGA-Based Unified Accelerator for Convolutional Neural Network and Vision Transformer[J]. Journal of Electronics & Information Technology, 2024, 46(6): 2663-2672. doi: 10.11999/JEIT230713

Citation:

LI Tianyang, ZHANG Fan, WANG Song, CAO Wei, CHEN Li. FPGA-Based Unified Accelerator for Convolutional Neural Network and Vision Transformer[J]. Journal of Electronics & Information Technology, 2024, 46(6): 2663-2672. doi: 10.11999/JEIT230713

Citation:

PDF( 3611 KB)

FPGA-Based Unified Accelerator for Convolutional Neural Network and Vision Transformer

doi: 10.11999/JEIT230713

1.
Institute of Information Technology, Information Engineering University, Zhengzhou 450002, China
2.
The 95072 Unit of PLA Air Force, Nanning 530000, China
3.
Institute for Big Data, Fudan University, Shanghai 200433, China

Funds: The TNational Key R&D Program of China (2022YFB4500900)

Received Date: 2023-07-15
Rev Recd Date: 2023-09-27

Available Online: 2023-10-08

Publish Date: 2024-06-30

Abstract

Abstract

Considering the problem that traditional Field Programmable Gate Array (FPGA)-based Convolutional Neural Network(CNN) accelerators in computer vision are not adapted to Vision Transformer networks, a unified FPGA accelerator for convolutional neural networks and Transformer is proposed. First, a generalized computation mapping method for FPGA is proposed based on the characteristics of convolution and attention mechanisms. Second, a nonlinear and normalized acceleration unit is proposed to provide acceleration support for multiple nonlinear operations in computer vision networks. Then, we implemented the accelerator design on Xilinx XCVU37P FPGA. Experimental results show that the proposed nonlinear acceleration unit improves the throughput while causing only a small accuracy loss. ResNet-50 and ViT-B/16 achieved 589.94 GOPS and 564.76 GOPS performance on the proposed FPGA accelerator. Compared to the GPU implementation, energy efficiency is improved by a factor of 5.19 and 7.17, respectively. Compared with other large FPGA-based designs, the energy efficiency is significantly improved, and the computing efficiency is increased by 8.02%～177.53% compared to other FPGA accelerators.
- Computer vision,
- Convolutional Neural Network (CNN),
- Transformer,
- Field Programmable Gate Array (FPGA),
- Hardware accelerator

FullText(HTML)

References(24)

References

[1]	SIMONYAN K and ZISSERMAN A. Very deep convolutional networks for large-scale image recognition[C]. 3rd International Conference on Learning Representations, San Diego, USA, 2015.
[2]	HE Kaiming, ZHANG Xiangyu, REN Shaoqing, et al. Deep residual learning for image recognition[C]. 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, USA, 2016: 770–778.
[3]	SZEGEDY C, LIU Wei, JIA Yangqing, et al. Going deeper with convolutions[C]. 2015 IEEE Conference on Computer Vision and Pattern Recognition, Boston, USA, 2015: 1–9.
[4]	VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]. The 31st International Conference on Neural Information Processing Systems, Long Beach, USA, 2017: 6000–6010.
[5]	CARION N, MASSA F, SYNNAEVE G, et al. End-to-end object detection with transformers[C]. The 16th European Conference on Computer Vision, Glasgow, UK, 2020: 213–229.
[6]	陈莹, 匡澄. 基于CNN和TransFormer多尺度学习行人重识别方法[J]. 电子与信息学报, 2023, 45(6): 2256–2263. doi: 10.11999/JEIT220601. CHEN Ying and KUANG Cheng. Pedestrian re-identification based on CNN and Transformer multi-scale learning[J]. Journal of Electronics &Information Technology, 2023, 45(6): 2256–2263. doi: 10.11999/JEIT220601.
[7]	ZHAI Xiaohua, KOLESNIKOV A, HOULSBY N, et al. Scaling vision transformers[C]. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, USA, 2022: 1204–1213.
[8]	DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16x16 words: transformers for image recognition at scale[C]. 9th International Conference on Learning Representations, 2021.
[9]	WANG Teng, GONG Lei, WANG Chao, et al. ViA: A novel vision-transformer accelerator based on FPGA[J]. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2022, 41(11): 4088–4099. doi: 10.1109/TCAD.2022.3197489.
[10]	NAG S, DATTA G, KUNDU S, et al. ViTA: A vision transformer inference accelerator for edge applications[C]. 2023 IEEE International Symposium on Circuits and Systems, Monterey, USA, 2023: 1–5.
[11]	LI Zhengang, SUN Mengshu, LU A, et al. Auto-ViT-Acc: an FPGA-aware automatic acceleration framework for vision transformer with mixed-scheme quantization[C]. 2022 32nd International Conference on Field-Programmable Logic and Applications, Belfast, UK, 2022: 109–116.
[12]	吴瑞东, 刘冰, 付平, 等. 应用于极致边缘计算场景的卷积神经网络加速器架构设计[J]. 电子与信息学报, 2023, 45(6): 1933–1943. doi: 10.11999/JEIT220130. WU Ruidong, LIU Bing, FU Ping, et al. Convolutional neural network accelerator architecture design for ultimate edge computing scenario[J]. Journal of Electronics &Information Technology, 2023, 45(6): 1933–1943. doi: 10.11999/JEIT220130.
[13]	NGUYEN D T, JE H, NGUYEN T N, et al. ShortcutFusion: from tensorflow to FPGA-based accelerator with a reuse-aware memory allocation for shortcut data[J]. IEEE Transactions on Circuits and Systems I:Regular Papers, 2022, 69(6): 2477–2489. doi: 10.1109/TCSI.2022.3153288.
[14]	LI Tianyang, ZHANG Fan, FAN Xitian, et al. Unified accelerator for attention and convolution in inference based on FPGA[C]. 2023 IEEE International Symposium on Circuits and Systems, Monterey, USA, 2023: 1–5.
[15]	LOMONT C. Fast inverse square root[EB/OL]. http://lomont.org/papers/2003/InvSqrt.pdf, 2023.
[16]	WU E, ZHANG Xiaoqian, BERMAN D, et al. A high-throughput reconfigurable processing array for neural networks[C]. 27th International Conference on Field Programmable Logic and Applications, Ghent, Belgium, 2017: 1–4.
[17]	FU Yao, WU E, SIRASAO A, et al. Deep learning with INT8 optimization on Xilinx devices[EB/OL]. Xilinx. https://www.origin.xilinx.com/content/dam/xilinx/support/documents/white_papers/wp486-deep-learning-int8.pdf, 2017.
[18]	ZHU Feng, GONG Ruihao, YU Fengwei, et al. Towards unified INT8 training for convolutional neural network[C]. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, USA, 2020: 1966–1976.
[19]	JACOB B, KLIGYS S, CHEN Bo, et al. Quantization and training of neural networks for efficient integer-arithmetic-only inference[C]. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, USA, 2018: 2704–2713.
[20]	SUN Qiwei, DI Zhixiong, LV Zhengyang, et al. A high speed SoftMax VLSI architecture based on basic-split[C]. 2018 14th IEEE International Conference on Solid-State and Integrated Circuit Technology, Qingdao, China, 2018: 1–3.
[21]	WANG Meiqi, LU Siyuan, ZHU Danyang, et al. A high-speed and low-complexity architecture for softmax function in deep learning[C]. 2018 IEEE Asia Pacific Conference on Circuits and Systems, Chengdu, China, 2018: 223–226.
[22]	GAO Yue, LIU Weiqiang, and LOMBARDI F. Design and implementation of an approximate softmax layer for deep neural networks[C]. 2020 IEEE International Symposium on Circuits and Systems, Seville, Spain, 2020: 1–5.
[23]	LI Yue, CAO Wei, ZHOU Xuegong, et al. A low-cost reconfigurable nonlinear core for embedded DNN applications[C]. 2020 International Conference on Field-Programmable Technology, Maui, USA, 2020: 35–38.
[24]	HADJIS S and OLUKOTUN K. TensorFlow to cloud FPGAs: Tradeoffs for accelerating deep neural networks[C]. 29th International Conference on Field Programmable Logic and Applications, Barcelona, Spain, 2019: 360–366.