高级搜索

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

NN-EdgeBuilder:面向边缘端设备的高性能神经网络推理框架

张萌 张雨 张经纬 曹新野 李鹤

张萌, 张雨, 张经纬, 曹新野, 李鹤. NN-EdgeBuilder:面向边缘端设备的高性能神经网络推理框架[J]. 电子与信息学报, 2023, 45(9): 3132-3140. doi: 10.11999/JEIT230325
引用本文: 张萌, 张雨, 张经纬, 曹新野, 李鹤. NN-EdgeBuilder:面向边缘端设备的高性能神经网络推理框架[J]. 电子与信息学报, 2023, 45(9): 3132-3140. doi: 10.11999/JEIT230325
ZHANG Meng, ZHANG Yu, ZHANG Jingwei, CAO Xinye, LI He. NN-EdgeBuilder: High-performance Neural Network Inference Framework for Edge Devices[J]. Journal of Electronics & Information Technology, 2023, 45(9): 3132-3140. doi: 10.11999/JEIT230325
Citation: ZHANG Meng, ZHANG Yu, ZHANG Jingwei, CAO Xinye, LI He. NN-EdgeBuilder: High-performance Neural Network Inference Framework for Edge Devices[J]. Journal of Electronics & Information Technology, 2023, 45(9): 3132-3140. doi: 10.11999/JEIT230325

NN-EdgeBuilder:面向边缘端设备的高性能神经网络推理框架

doi: 10.11999/JEIT230325
基金项目: 广东省重点领域研发计划(2021B1101270006),江苏省自然科学基金(BK20201145)
详细信息
    作者简介:

    张萌:男,研究员,研究方向为数字信号处理、深度学习算法及硬件加速

    张雨:男,硕士生,研究方向为深度学习硬件加速器设计

    张经纬:男,博士生,研究方向为计算机视觉和深度学习硬件加速器设计

    曹新野:男,硕士生,研究方向为深度学习硬件加速器设计

    李鹤:男,副研究员,研究方向为可编程芯片(FPGA)开发和系统优化

    通讯作者:

    张雨 zhangyu_seu@foxmail.com

  • 中图分类号: TN79.1

NN-EdgeBuilder: High-performance Neural Network Inference Framework for Edge Devices

Funds: The Research and Development Program of Guangdong Province (2021B1101270006), The Natural Science Foundation of Jiangsu Province (BK20201145)
  • 摘要: 飞速发展的神经网络已经在目标检测等领域取得了巨大的成功,通过神经网络推理框架将网络模型高效地自动部署在各类边缘端设备上是目前重要的研究方向。针对以上问题,该文设计一个针对边缘端FPGA的神经网络推理框架NN-EdgeBuilder,能够利用基于多目标贝叶斯优化的设计空间探索算法充分探索网络每层的并行度因子和量化位宽,接着调用高性能且通用的硬件加速算子来生成低延迟、低功耗的神经网络加速器。该文使用NN-EdgeBuilder在Ultra96-V2 FPGA上部署了UltraNet和VGG网络,生成的UltraNet-P1加速器与最先进的UltraNet定制加速器相比,功耗和能效比表现分别提升了17.71%和21.54%。与主流的推理框架相比,NN-EdgeBuilder生成的VGG加速器能效比提升了4.40倍,数字信号处理器(DSP)的计算效率提升了50.65%。
  • 图  1  推理框架NN-EdgeBuilder部署网络模型的流程

    图  2  量化模块运行流程

    图  3  FC/Conv通用计算单元

    图  4  Line_buffer工作流程

    图  5  并行度因子控制MAC数量

    算法1 全连接运算循环嵌套
     (1) Loop1: $ \text{ for(ci=0;ci < CI;ci++)} $
     (2) Loop2:  $ \text{ for(co=0;co < CO;co++)} $
     (3)       ${ {\boldsymbol{O} }_{{\rm{fc}}} }[{\rm{co}}]$+=$ {I_{{\text{fc}}}}{\text{[ci]}} \times {F_{{\text{fc}}}}{\text{[co,ci]}} $
     (4) EndLoop
    下载: 导出CSV
    算法2 卷积运算循环嵌套
     (1) Loop1: $ \text{ for(wo=0;wo < WO;wo++)} $
     (2) Loop2:  $ \text{ for(ho=0;ho < HO;ho++)} $
     (3) Loop3:   $ \text{ for(co=0;co < CO;co++)} $
     (4) Loop4:    $ \text{ for(ci=0;ci < CI;ci++)} $
     (5) Loop5:     $ \text{ for(hf=0;hf < HF;hf++)} $
     (6) Loop6:      $ \text{ for(wf=0;wf < WF;wf++)} $
     (7)           ${ {\boldsymbol{O} } }_{\text{conv} }[{\rm{co}},{\rm{ho}},{\rm{wo}}]$+=${I_{ {\text{conv} } } }[{\rm{ci}},{\rm{ho} } + {\rm{hf} },$
                 ${\rm{wo}} + {\rm{wf}}] \times{F_{ {\text{conv} } } }{\text{[co,ci,hf,wf]} }$
     (8) EndLoop
    下载: 导出CSV
    算法3 贝叶斯优化算法流程
     输入:设计空间$F$,代理模型${ {\rm{GP}} _M}$,采集函数${{\rm{EHVIC}}}$,目标
        函数${\varphi }(x)$,约束$ {C}(x) $
     输出:推理框架NN-EdgeBuilder自动部署的加速器设计空间的
        Pareto前沿$P({{\mathcal{V} } })$
     (1) 在$F$内采样,得到包含$J$个样本的数据集${D_{\varphi } } = ({\boldsymbol{X}},{\boldsymbol{Y}})$,约
       束集${D_{C}} = \{ {C}(x)\} $
     (2) while !(停止条件) do
     (3)  根据样本集${D_{\varphi }}$和约束集${D_{C}}$拟合代理模型${ {\rm GP} _{{M} } }$
     (4)  对于$ \forall p \in {P_n} $,算出期望的超体积改进量${{\rm{EHVI}}} (x)$和满足
        约束的期望${{\rm{CS}}} (p)$
     (5)  求出采集函数的极值${x^{J + 1} } = \arg \mathop {\max }\limits_{x \in F} {{\rm{EHVIC}}} (x)$,选择
        新的采样点
     (6)  运行Vivado工具流得到准确的函数值${\varphi }({x^{J + 1}})$,约束
        $ {C}({x^{J + 1}}) $
     (7)  更新数据集${D_{\varphi }}$和约束集${D_{C}}$
     (8) end while
     (9) return加速器设计空间的Pareto前沿$ P({ {\mathcal{V} } }) $
    下载: 导出CSV

    表  1  UltraNet加速器性能对比

    加速器IOUFPSEnergy(J)GOPSGOPS/W
    UltraNet-P10.702210730.2387.7319.9
    UltraNet-P20.703209033.0384.6292.8
    SEUer0.703202036.7371.7263.2
    ultrateam0.703226640.3416.9239.7
    下载: 导出CSV

    表  2  NN-EdgeBuilder和其他推理框架部署VGG网络的性能对比

    NN-EdgeBuilderDeepBurning-SEG[6]fpgaConvNet[7]HyBridDNN[8]DNNBuilder[9]
    支持的
    深度学习框架
    PyTorch,
    TensorFlow & Keras
    Caffe & TorchCaffe
    FPGA平台ZU3EGZU3EGXC7Z020XC7Z020XC7Z045
    频率(MHz)250200125100200
    DSP360264220220680
    量化精度4 bit8 bit16 bit16 bit8 bit
    GOPS4182034883524
    GOPS/DSP1.160.770.220.380.77
    GOPS/W320.27.332.072.8
    下载: 导出CSV
  • [1] SIMONYAN K and ZISSERMAN A. Very deep convolutional networks for large-scale image recognition[C]. The 3rd International Conference on Learning Representations, San Diego, USA, 2015: 1–14.
    [2] HE Kaiming, ZHANG Xiangyu, REN Shaoqing, et al. Deep residual learning for image recognition[C]. The 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, USA, 2016: 770–778.
    [3] 张萌, 张经纬, 李国庆, 等. 面向深度神经网络加速芯片的高效硬件优化策略[J]. 电子与信息学报, 2021, 43(6): 1510–1517. doi: 10.11999/JEIT210002

    ZHANG Meng, ZHANG Jingwei, LI Guoqing, et al. Efficient hardware optimization strategies for deep neural networks acceleration chip[J]. Journal of Electronics &Information Technology, 2021, 43(6): 1510–1517. doi: 10.11999/JEIT210002
    [4] ZHANG Xiaofan, LU Haoming, HAO Cong, et al. SkyNet: a hardware-efficient method for object detection and tracking on embedded systems[C]. Machine Learning and Systems, Austin, USA, 2020: 216–229.
    [5] LI Guoqing, ZHANG Jingwei, ZHANG Meng, et al. Efficient depthwise separable convolution accelerator for classification and UAV object detection[J]. Neurocomputing, 2022, 490: 1–16. doi: 10.1016/j.neucom.2022.02.071
    [6] CAI Xuyi, WANG Ying, MA Xiaohan, et al. DeepBurning-SEG: Generating DNN accelerators of segment-grained pipeline architecture[C]. 2022 55th IEEE/ACM International Symposium on Microarchitecture (MICRO), Chicago, USA, 2022: 1396–1413.
    [7] VENIERIS S I and BOUGANIS C S. fpgaConvNet: Mapping regular and irregular convolutional neural networks on FPGAs[J]. IEEE Transactions on Neural Networks and Learning Systems, 2019, 30(2): 326–342. doi: 10.1109/TNNLS.2018.2844093
    [8] YE Hanchen, ZHANG Xiaofan, HUANG Zhize, et al. HybridDNN: A framework for high-performance hybrid DNN accelerator design and implementation[C]. 2020 57th ACM/IEEE Design Automation Conference (DAC), San Francisco, USA, 2020: 1–6.
    [9] ZHANG Xiaofan, WANG Junsong, ZHU Chao, et al. DNNBuilder: An automated tool for building high-performance DNN hardware accelerators for FPGAs[C]. 2018 IEEE/ACM International Conference on Computer-Aided Design (ICCAD), San Diego, USA, 2018: 1–8.
    [10] BANNER R, NAHSHAN Y, and SOUDRY D. Post training 4-bit quantization of convolutional networks for rapid-deployment[C]. The 33rd International Conference on Neural Information Processing Systems, Vancouver, Canada, 2019: 714.
    [11] DUARTE J, HAN S, HARRIS P, et al. Fast inference of deep neural networks in FPGAs for particle physics[J]. Journal of Instrumentation, 2018, 13: P07027. doi: 10.1088/1748-0221/13/07/P07027
    [12] GHIELMETTI N, LONCAR V, PIERINI M, et al. Real-time semantic segmentation on FPGAs for autonomous vehicles with hls4ml[J]. Machine Learning:Science and Technology, 2022, 3(4): 045011. doi: 10.1088/2632-2153/ac9cb5
    [13] ZHANG Zheng, CHEN Tinghuan, HUANG Jiaxin, et al. A fast parameter tuning framework via transfer learning and multi-objective bayesian optimization[C]. The 59th ACM/IEEE Design Automation Conference, San Francisco, USA, 2022: 133–138. doi: 10.1145/3489517.3530430.
    [14] HUTTER F, HOOS H H, and LEYTON-BROWN K. Sequential model-based optimization for general algorithm configuration[C]. The 5th International Conference on Learning and Intelligent Optimization, Rome, Italy, 2011: 507–523.
    [15] ZHAN Dawei and XING Huanlai. Expected improvement for expensive optimization: A review[J]. Journal of Global Optimization, 2020, 78(3): 507–544. doi: 10.1007/s10898-020-00923-x
    [16] EMMERICH M T M, DEUTZ A H, and KLINKENBERG J W. Hypervolume-based expected improvement: Monotonicity properties and exact computation[C]. 2011 IEEE Congress of Evolutionary Computation (CEC), New Orleans, USA, 2011: 2147–2154.
    [17] ABDOLSHAH M, SHILTON A, RANA S, et al. Expected hypervolume improvement with constraints[C]. 2018 24th International Conference on Pattern Recognition (ICPR), Beijing, China, 2018: 3238–3243.
  • 加载中
图(5) / 表(5)
计量
  • 文章访问数:  664
  • HTML全文浏览量:  338
  • PDF下载量:  108
  • 被引次数: 0
出版历程
  • 收稿日期:  2023-04-26
  • 修回日期:  2023-08-23
  • 网络出版日期:  2023-08-28
  • 刊出日期:  2023-09-27

目录

    /

    返回文章
    返回