A Parallelism Strategy Optimization Search Algorithm Based on Three-dimensional Deformable CNN Acceleration Architecture

QU Xinyuan; XU Yu; HUANG Zhihong; CAI Gang; FANG Zhen

doi:10.11999/JEIT210059

Volume 44 Issue 4

Apr. 2022

Turn off MathJax

Article Contents

Article Navigation > Journal of Electronics & Information Technology > 2022 > 44(4): 1503-1512

QU Xinyuan, XU Yu, HUANG Zhihong, CAI Gang, FANG Zhen. A Parallelism Strategy Optimization Search Algorithm Based on Three-dimensional Deformable CNN Acceleration Architecture[J]. Journal of Electronics & Information Technology, 2022, 44(4): 1503-1512. doi: 10.11999/JEIT210059

Citation:

QU Xinyuan, XU Yu, HUANG Zhihong, CAI Gang, FANG Zhen. A Parallelism Strategy Optimization Search Algorithm Based on Three-dimensional Deformable CNN Acceleration Architecture[J]. Journal of Electronics & Information Technology, 2022, 44(4): 1503-1512. doi: 10.11999/JEIT210059

Citation:

QU Xinyuan, XU Yu, HUANG Zhihong, CAI Gang, FANG Zhen. A Parallelism Strategy Optimization Search Algorithm Based on Three-dimensional Deformable CNN Acceleration Architecture[J]. Journal of Electronics & Information Technology, 2022, 44(4): 1503-1512. doi: 10.11999/JEIT210059

PDF( 1497 KB)

A Parallelism Strategy Optimization Search Algorithm Based on Three-dimensional Deformable CNN Acceleration Architecture

doi: 10.11999/JEIT210059 cstr: 32379.14.JEIT210059

1.
Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100190, China
2.
School of Electronic, Electrical, and Communication Engineering, University of Chinese Academy of Sciences (UCAS), Beijing 100049, China

Funds: The National Natural Science Foundation of China (61704173, 61974146), The Major Program of Beijing Science and Technology (Z171100000117019)

Received Date: 2021-01-08
Rev Recd Date: 2021-08-04

Available Online: 2021-09-09

Publish Date: 2022-04-18

Abstract

Abstract

Field Programmable Gate Array (FPGA) is widely used in Convolutional Neural Network (CNN) hardware acceleration. For better performance, a three-dimensional transformable CNN acceleration structure is proposed by Qu et al (2021). However, this structure brings an explosive growth of the parallelism strategy exploration space, thus the time cost to search the optimal parallelism has surged, which reduces severely the feasibility of accelerator implementation. To solve this issue, a fine-grained iterative optimization parallelism search algorithm is proposed in this paper. The algorithm uses multiple rounds of iterative data filtering to eliminate efficiently the redundant parallelism schemes, compressing more than 99% of the search space. At the same time, the algorithm uses pruning operation to delete invalid calculation branches, and reduces successfully the calculation time from 10⁶ h to less than 10 s. The algorithm can achieve outstanding performance in different kinds of FPGAs, with an average computing resource utilization (R1, R2) up to (0.957, 0.962).
- Field Programmable Gate Array (FPGA),
- Convolutional Neural Network (CNN),
- Hardware acceleration

FullText(HTML)

References(15)

References

[1]	LECUN Y, BOTTOU L, BENGIO Y, et al. Gradient-based learning applied to document recognition[J]. Proceedings of the IEEE, 1998, 86(11): 2278–2324. doi: 10.1109/5.726791
[2]	QU Xinyuan, HUANG Zhihong, XU Yu, et al. Cheetah: An accurate assessment mechanism and a high-throughput acceleration architecture oriented toward resource efficiency[J]. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2021, 40(5): 878–891. doi: 10.1109/TCAD.2020.3011650
[3]	REGGIANI E, RABOZZI M, NESTOROV A M, et al. Pareto optimal design space exploration for accelerated CNN on FPGA[C]. 2019 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), Rio de Janeiro, Brazil, 2019: 107–114. doi: 10.1109/IPDPSW.2019.00028.
[4]	YU Xiaoyu, WANG Yuwei, MIAO Jie, et al. A data-center FPGA acceleration platform for convolutional neural networks[C]. 2019 29th International Conference on Field Programmable Logic and Applications (FPL), Barcelona, Spain, 2019: 151–158. doi: 10.1109/FPL.2019.00032.
[5]	LIU Zhiqiang, CHOW P, XU Jinwei, et al. A uniform architecture design for accelerating 2D and 3D CNNs on FPGAs[J]. Electronics, 2019, 8(1): 65. doi: 10.3390/electronics8010065
[6]	LI Huimin, FAN Xitian, JIAO Li, et al. A high performance FPGA-based accelerator for large-scale convolutional neural networks[C]. 2016 26th International Conference on Field Programmable Logic and Applications (FPL), Lausanne, Swiss, 2016: 1–9. doi: 10.1109/FPL.2016.7577308.
[7]	QIU Jiantao, WANG Jie, YAO Song, et al. Going deeper with embedded FPGA platform for convolutional neural network[C]. The 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, California, USA, 2016: 26–35.
[8]	ZHANG Xiaofan, WANG Junsong, ZHU Chao, et al. DNNBuilder: An automated tool for building high-performance DNN hardware accelerators for FPGAs[C]. 2018 IEEE/ACM International Conference on Computer-Aided Design (ICCAD), San Diego, USA, 2018: 1–8. doi: 10.1145/3240765.3240801.
[9]	LIU Zhiqiang, DOU Yong, JIANG Jingfei, et al. Automatic code generation of convolutional neural networks in FPGA implementation[C]. 2016 International Conference on Field-Programmable Technology (FPT), Xi’an, China, 2016: 61–68. doi: 10.1109/FPT.2016.7929190.
[10]	KRIZHEVSKY A, SUTSKEVER I, and HINTON G E. ImageNet classification with deep convolutional neural networks[J]. Communications of the ACM, 2017, 60(6): 84–90. doi: 10.1145/3065386
[11]	MA Yufei, CAO Yu, VRUDHULA S, et al. Optimizing the convolution operation to accelerate deep neural networks on FPGA[J]. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 2018, 26(7): 1354–1367. doi: 10.1109/TVLSI.2018.2815603
[12]	GUO Kaiyuan, SUI Lingzhi, QIU Jiantao, et al. Angel-Eye: A complete design flow for mapping CNN onto embedded FPGA[J]. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2018, 37(1): 35–47. doi: 10.1109/TCAD.2017.2705069
[13]	ZHANG Chen, SUN Guangyu, FANG Zhenman, et al. Caffeine: Toward uniformed representation and acceleration for deep convolutional neural networks[J]. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2019, 38(11): 2072–2085. doi: 10.1109/TCAD.2017.2785257
[14]	ZHANG Jialiang and LI Jing. Improving the performance of OpenCL-based FPGA accelerator for convolutional neural network[C]. The 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, California, USA, 2017: 25–34. doi: 10.1145/3020078.3021698.
[15]	LIU Zhiqiang, DOU Yong, JIANG Jingfei, et al. Throughput-optimized FPGA accelerator for deep convolutional neural networks[J]. ACM Transactions on Reconfigurable Technology and Systems, 2017, 10(3): 17. doi: 10.1145/3079758