Advanced Search
Volume 45 Issue 6
Jun.  2023
Turn off MathJax
Article Contents
WU Ruidong, LIU Bing, FU Ping, JI Xinglong, LU Wenshuai. Convolutional Neural Network Accelerator Architecture Design for Ultimate Edge Computing Scenario[J]. Journal of Electronics & Information Technology, 2023, 45(6): 1933-1943. doi: 10.11999/JEIT220130
Citation: WU Ruidong, LIU Bing, FU Ping, JI Xinglong, LU Wenshuai. Convolutional Neural Network Accelerator Architecture Design for Ultimate Edge Computing Scenario[J]. Journal of Electronics & Information Technology, 2023, 45(6): 1933-1943. doi: 10.11999/JEIT220130

Convolutional Neural Network Accelerator Architecture Design for Ultimate Edge Computing Scenario

doi: 10.11999/JEIT220130
Funds:  The National Natural Science Foundation of China (62171156)
  • Received Date: 2022-02-15
  • Rev Recd Date: 2022-07-10
  • Available Online: 2022-07-15
  • Publish Date: 2023-06-10
  • In order to meet the requirements of performance and power in Ultimate Edge Computing (UEC) scenario, a Convolutional Neural Network (CNN) accelerator architecture is proposed with 16 Bit quantization model that does not rely on external memory. The basic structure of proposed architecture is Field Programmable Gate Array (FPGA) with multi-core CNN full pipeline accelerator. On this basis, the optimization of intra-layer mapping and inter-layer fusion of accelerator is realized. Then, the evaluation of computing resource and memory resource are theoretically completed by building the corresponding model. Under the guidance of this model, the resource utilization and computing efficiency are maximized through design space exploration, and the peak computing power of accelerator is fully exploited with limited resource constraint. Finally, taking fast human detection of nano Unmanned Aerial Vehicle (UAV) as an example, the verification and analysis of architecture are completed through experiments. Experimental results show that in the inference of human body detection neural network based on Single Shot multibox Detector (SSD), the performance is achieved with the speed of frame rate 137 and 34 at 100 MHz and 25 MHz, and the corresponding power is 0.514 W and 0.263 W, respectively, which meets the performance and power requirements of real-time image processing in typical UEC scenarios such as autonomous computing of nano-UAV.
  • loading
  • [1]
    BIANCHI V, BASSOLI M, LOMBARDO G, et al. IoT wearable sensor and deep learning: An integrated approach for personalized human activity recognition in a smart home environment[J]. IEEE Internet of Things Journal, 2019, 6(5): 8553–8562. doi: 10.1109/JIOT.2019.2920283
    [2]
    施巍松, 张星洲, 王一帆, 等. 边缘计算: 现状与展望[J]. 计算机研究与发展, 2019, 56(1): 69–89. doi: 10.7544/issn1000-1239.2019.20180760

    SHI Weisong, ZHANG Xingzhou, WANG Yifan, et al. Edge computing: State-of-the-art and future directions[J]. Journal of Computer Research and Development, 2019, 56(1): 69–89. doi: 10.7544/issn1000-1239.2019.20180760
    [3]
    KRIZHEVSKY A, SUTSKEVER I, and HINTON G E. ImageNet classification with deep convolutional neural networks[C]. The 25th International Conference on Neural Information Processing Systems, Lake Tahoe, USA, 2012: 1097–1105.
    [4]
    ROY S K, KRISHNA G, DUBEY S R, et al. HybridSN: Exploring 3-D–2-D CNN feature hierarchy for hyperspectral image classification[J]. IEEE Geoscience and Remote Sensing Letters, 2020, 17(2): 277–281. doi: 10.1109/LGRS.2019.2918719
    [5]
    LIN T Y, GOYAL P, GIRSHICK R, et al. Focal loss for dense object detection[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020, 42(2): 318–327. doi: 10.1109/TPAMI.2018.2858826
    [6]
    USAMA M, AHMAD B, SONG Enmin, et al. Attention-based sentiment analysis using convolutional and recurrent neural network[J]. Future Generation Computer Systems, 2020, 113: 571–578. doi: 10.1016/j.future.2020.07.022
    [7]
    WAN Shaohua and GOUDOS S. Faster R-CNN for multi-class fruit detection using a robotic vision system[J]. Computer Networks, 2020, 168: 107036. doi: 10.1016/j.comnet.2019.107036
    [8]
    ACHARYA J and BASU A. Deep neural network for respiratory sound classification in wearable devices enabled by patient specific model tuning[J]. IEEE Transactions on Biomedical Circuits and Systems, 2020, 14(3): 535–544. doi: 10.1109/TBCAS.2020.2981172
    [9]
    WANG Yu, YANG Jie, LIU Miao, et al. LightAMC: Lightweight automatic modulation classification via deep learning and compressive sensing[J]. IEEE Transactions on Vehicular Technology, 2020, 69(3): 3491–3495. doi: 10.1109/TVT.2020.2971001
    [10]
    WU Huaqiang, LYU Feng, ZHOU Conghao, et al. Optimal UAV caching and trajectory in aerial-assisted vehicular networks: A learning-based approach[J]. IEEE Journal on Selected Areas in Communications, 2020, 38(12): 2783–2797. doi: 10.1109/JSAC.2020.3005469
    [11]
    Bitcraze. Crazyflie 2.1[EB/OL]. https://www.bitcraze.io/products/crazyflie-2-1/, 2022.
    [12]
    PALOSSI D, LOQUERCIO A, CONTI F, et al. A 64-mW DNN-based visual navigation engine for autonomous nano-drones[J]. IEEE Internet of Things Journal, 2019, 6(5): 8357–8371. doi: 10.1109/JIOT.2019.2917066
    [13]
    NICULESCU V, LAMBERTI L, CONTI F, et al. Improving autonomous nano-drones performance via automated end-to-end optimization and deployment of DNNs[J]. IEEE Journal on Emerging and Selected Topics in Circuits and Systems, 2021, 11(4): 548–562. doi: 10.1109/JETCAS.2021.3126259
    [14]
    PALOSSI D, ZIMMERMAN N, BURRELLO A, et al. Fully onboard AI-powered human-drone pose estimation on ultralow-power autonomous flying nano-UAVs[J]. IEEE Internet of Things Journal, 2022, 9(3): 1913–1929. doi: 10.1109/JIOT.2021.3091643
    [15]
    刘勤让, 刘崇阳. 利用参数稀疏性的卷积神经网络计算优化及其FPGA加速器设计[J]. 电子与信息学报, 2018, 40(6): 1368–1374. doi: 10.11999/JEIT170819

    LIU Qinrang and LIU Chongyang. Calculation optimization for convolutional neural networks and FPGA-based accelerator design using the parameters sparsity[J]. Journal of Electronics &Information Technology, 2018, 40(6): 1368–1374. doi: 10.11999/JEIT170819
    [16]
    秦华标, 曹钦平. 基于FPGA的卷积神经网络硬件加速器设计[J]. 电子与信息学报, 2019, 41(11): 2599–2605. doi: 10.11999/JEIT190058

    QIN Huabiao and CAO Qinping. Design of convolutional neural networks hardware acceleration based on FPGA[J]. Journal of Electronics &Information Technology, 2019, 41(11): 2599–2605. doi: 10.11999/JEIT190058
    [17]
    YUAN Tian, LIU Weiqiang, HAN Jie, et al. High performance CNN accelerators based on hardware and algorithm co-optimization[J]. IEEE Transactions on Circuits and Systems I:Regular Papers, 2021, 68(1): 250–263. doi: 10.1109/TCSI.2020.3030663
    [18]
    GONG Lei, WANG Chao, LI Xi, et al. MALOC: A fully pipelined FPGA accelerator for convolutional neural networks with all layers mapped on chip[J]. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2018, 37(11): 2601–2612. doi: 10.1109/TCAD.2018.2857078
    [19]
    WANG Chao, GONG Lei, YU Qi, et al. DLAU: A scalable deep learning accelerator unit on FPGA[J]. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2017, 36(3): 513–517. doi: 10.1109/TCAD.2016.2587683
    [20]
    DING Wei, HUANG Zeyu, HUANG Zunkai, et al. Designing efficient accelerator of depthwise separable convolutional neural network on FPGA[J]. Journal of Systems Architecture, 2019, 97: 278–286. doi: 10.1016/j.sysarc.2018.12.008
    [21]
    BLOTT M, PREUßER T B, FRASER N J, et al. FINN-R: An end-to-end deep-learning framework for fast exploration of quantized neural networks[J]. ACM Transactions on Reconfigurable Technology and Systems, 2018, 11(3): 16. doi: 10.1145/3242897
    [22]
    BAI Lin, ZHAO Yiming, and HUANG Xinming. A CNN accelerator on FPGA using depthwise separable convolution[J]. IEEE Transactions on Circuits and Systems II:Express Briefs, 2018, 65(10): 1415–1419. doi: 10.1109/TCSII.2018.2865896
    [23]
    GUO Kaiyuan, SUI Lingzhi, QIU Jiantao, et al. Angel-eye: A complete design flow for mapping CNN onto embedded FPGA[J]. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2018, 37(1): 35–47. doi: 10.1109/TCAD.2017.2705069
    [24]
    ZHU Jiang, WANG Lizan, LIU Haolin, et al. An efficient task assignment framework to accelerate DPU-based convolutional neural network inference on FPGAs[J]. IEEE Access, 2020, 8: 83224–83237. doi: 10.1109/ACCESS.2020.2988311
  • 加载中

Catalog

    通讯作者: 陈斌, bchen63@163.com
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Figures(8)  / Tables(8)

    Article Metrics

    Article views (1148) PDF downloads(240) Cited by()
    Proportional views
    Related

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return