高级搜索

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

YOMANet-Accel:面向边缘端人车检测的轻量化算法加速器

陈宁江 卢耀宗

陈宁江, 卢耀宗. YOMANet-Accel:面向边缘端人车检测的轻量化算法加速器[J]. 电子与信息学报. doi: 10.11999/JEIT250059
引用本文: 陈宁江, 卢耀宗. YOMANet-Accel:面向边缘端人车检测的轻量化算法加速器[J]. 电子与信息学报. doi: 10.11999/JEIT250059
CHEN Ningjiang, LU Yaozong. YOMANet-Accel: A Lightweight Algorithm Accelerator for Pedestrians and Vehicles Detection at the Edge[J]. Journal of Electronics & Information Technology. doi: 10.11999/JEIT250059
Citation: CHEN Ningjiang, LU Yaozong. YOMANet-Accel: A Lightweight Algorithm Accelerator for Pedestrians and Vehicles Detection at the Edge[J]. Journal of Electronics & Information Technology. doi: 10.11999/JEIT250059

YOMANet-Accel:面向边缘端人车检测的轻量化算法加速器

doi: 10.11999/JEIT250059 cstr: 32379.14.JEIT250059
基金项目: 国家自然科学基金(62162003),中央引导地方科技发展资金(桂科ZY24212059)
详细信息
    作者简介:

    陈宁江:男,博士,教授,研究方向为智能软件工程、云计算、大数据、分布式计算、边缘计算

    卢耀宗:男,硕士生,研究方向为边缘计算

    通讯作者:

    卢耀宗 luyaozong9725@163.com

  • 中图分类号: TP389.1

YOMANet-Accel: A Lightweight Algorithm Accelerator for Pedestrians and Vehicles Detection at the Edge

Funds: The National Natural Science Foundation of China (62162003), The Central Guidance on Local Science and Technology Development Fund of Guangxi Province (GuikeZY24212059)
  • 摘要: 针对自动驾驶边缘计算场景中行人车辆检测任务面临的模型计算复杂度高、参数量大导致的部署难题,该文提出一种轻量化神经网络模型YOMANet (Yolo Model Adaptation Network),基于异构FPGA平台设计YOMANet加速器(YOMANet Accelerator, YOMANet-Accel),实现边缘端人车检测的算法加速。YOMANet算法的主干网络采用轻量型网络MobileNetv2以大幅压缩模型参数量,颈部网络使用深度可分离卷积来代替常规卷积以提升训练速度,并在头部网络嵌入基于归一化的注意力模块(Normalization-based Attention Module, NAM)以增强网络对细节信息的捕获能力。为将YOMANet算法部署到现场可编程门阵列(Field-Programmable Gate Array, FPGA)平台,该文针对卷积运算在任务层设计循环分块以调整内循环和外循环的顺序,在运算层对处理引擎单元(Processing Engine, PE)设计乘加树,使得多个乘加运算可以同时执行,提高数据的并行计算效率。同时,针对数据存储过程采用双缓存机制来减少数据传输时延,对权重参数和激活函数进行int8数据量化以降低资源消耗。实验结果表明,YOMANet算法在训练平台上的检测精度和检测速度表现优异,对小目标和遮挡目标具备较好的检测能力,有效减少了误检和漏检情况的发生。算法部署到硬件平台后,YOMANet-Accel的目标检测效果保持在较高水平,硬件资源的能效比表现良好,有效发挥了FPGA的并行优势。
  • 图  1  YOMANet轻量化神经网络模型的架构

    图  2  Bottleneck模块的反向残差结构

    图  3  标准卷积过程

    图  4  深度可分离卷积过程

    图  5  通道注意力子模块示意图

    图  6  空间注意力子模块示意图

    图  7  YOMANet-Accel整体架构

    图  8  循环分块技术

    图  9  PE乘加树设计

    图  10  双缓存机制

    图  11  目标数据集各类比数量占比

    图  12  目标类别的Precision、Recall以及AP值

    图  13  不同轻量化算法的图像检测效果对比

    图  14  YOMANet算法在不同平台上的性能表现

    表  1  YOMANet主干网络模型结构

    InputOperatetcSInputOperatetcS
    416×416×3Conv2d-32226×26×32Bottleneck5-46641
    208×208×32Bottleneck2-1116126×26×64Bottleneck6-16641
    208×208×16Bottleneck3-1616226×26×64Bottleneck6-26641
    104×104×16Bottleneck3-2624126×26×64Bottleneck6-36961
    104×104×24Bottleneck4-1624226×26×96Bottleneck7-16962
    52×52×24Bottleneck4-2624113×13×96Bottleneck7-26961
    52×52×24Bottleneck4-3632113×13×96Bottleneck7-361601
    52×52×32Bottleneck5-1632213×13×160Bottleneck8-163201
    26×26×32Bottleneck5-2632113×13×320DSConv×3---
    26×26×32Bottleneck5-36321-----
    下载: 导出CSV

    表  2  不同算法在GPU平台上的性能比较

    ModelBackboneInput sizeData typemAP@0.5(%)Size(MB)FPS(帧/秒)
    Faster RCNNVGG16600×1000float3290.13500.6718
    SSDVGG16512×512float3288.13287.6428
    YOLOv4CSP-DarkNet53416×416float3288.29249.4839
    YOLOv5CSP-DarkNet53640×640float3288.72223.6251
    CCBA-NMS-YD[7]VGG16512×512float3287.06--
    YOLOv3-Improved[8]Darknet53416×416float3286.24--
    YOLOv5s-RFB-s-ASFF[9]CSP-RFB-s-ASFF640×640float3284.01-61
    YOLOP-E[28]EfficientNetv2-float3279.2027.641.6
    YOLOv4-tinyCSP-Darknet53-tiny416×416float3280.6937.9474
    YOLOv5sCSP-DarkNet53640×640float3283.5134.4678
    YOLOv7-tinyCSP-PANet416×416float3286.6832.7584
    YOMANetMobileNetv2416×416float3288.2630.9580
    下载: 导出CSV

    表  3  消融实验性能比较

    MobileNetv2 DSConv NAM Data type mAP(%) Size(MB) FPS(帧/秒) Power(W)
    × × × float32 88.29 249.48 39 167.436
    × × float32 86.76 80.24 63 123.481
    × float32 86.37 28.89 84 106.274
    float32 88.26 30.95 80 108.915
    下载: 导出CSV

    表  4  数据量化前后效果对比

    ModelData typemAP@0.5(%)Size(MB)
    YOMANetfloat3288.2630.95
    int886.237.76
    下载: 导出CSV

    表  5  与相关文献的加速器性能对比

    文献[11]文献[12]文献[13]文献[14]文献[29]本文
    ModelYOLOv4-tinyYOLOv3-tinyYOLOv5sUltranetYOLOv3-tinyYOMANet
    BackboneCSP-Darknet53DarkNet53CSP-DarkNet53VGG16DarkNet53MobileNetv2
    FPGAZYNQ 7020Nexys A7-100TZYNQ 7020ZYNQ ZU3EGZYNQ XCZU9EGZYNQ 7020
    DSP220240220360298220
    Data type16 bitint816 bit4 bit16 bitint8
    Power(W)2.7502.2033.0396.6504.1207.402
    GOPS-95.0830.10126.7296.60100.23
    GOPS/DSP-0.3960.1370.3520.3240.456
    GOPS/W-43.169.9019.0623.4513.54
    Size(MB)23.70--15.18-7.76
    FPS(帧/秒)-76.75-220.7617.3040.15
    mAP@0.5(%)77.8081.1940.30-31.5086.23
    下载: 导出CSV
  • [1] ZHAN Jiao, LIU Jingnan, WU Yejun, et al. Multi-task visual perception for object detection and semantic segmentation in intelligent driving[J]. Remote Sensing, 2024, 16(10): 1774. doi: 10.3390/rs16101774.
    [2] REDMON J, DIVVALA S, GIRSHICK R, et al. You only look once: Unified, real-time object detection[C]. Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, USA, 2016: 779–788. doi: 10.1109/CVPR.2016.91.
    [3] 谭郁松, 李恬, 张钰森. 面向边缘智能的神经网络模型生成与部署研究[J]. 计算机工程, 2024, 50(8): 1–12. doi: 10.19678/j.issn.1000-3428.0068554.

    TAN Yusong, LI Tian, and ZHANG Yusen. Research on neural network model generation and deployment for edge intelligence[J]. Computer Engineering, 2024, 50(8): 1–12. doi: 10.19678/j.issn.1000-3428.0068554.
    [4] REN Shaoqing, HE Kaiming, GIRSHICK R, et al. Faster R-CNN: Towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6): 1137–1149. doi: 10.1109/TPAMI.2016.2577031.
    [5] LIU Wei, ANGUELOV D, ERHAN D, et al. SSD: Single shot MultiBox detector[C]. Proceedings of the 14th European Conference on Computer Vision, Amsterdam, The Netherlands, 2016: 21–37. doi: 10.1007/978-3-319-46448-0_2.
    [6] WEI Hongyang, ZHANG Qianqian, HAN Jingjing, et al. SARNet: Spatial Attention Residual Network for pedestrian and vehicle detection in large scenes[J]. Applied Intelligence, 2022, 52(15): 17718–17733. doi: 10.1007/s10489-022-03217-9.
    [7] YUAN Zhenhao, WANG Zhiwen, and ZHANG Ruonan. CCBA-NMS-YD: A Vehicle pedestrian detection and tracking method based on improved YOLOv7 and DeepSort[J]. World Electric Vehicle Journal, 2024, 15(7): 309. doi: 10.3390/wevj15070309.
    [8] 王启明, 何梓林, 张栋林, 等. 基于YOLOv3的雾天场景行人车辆检测方法研究[J]. 控制工程, 2024, 31(3): 510–517. doi: 10.14107/j.cnki.kzgc.20211118.

    WANG Qiming, HE Zilin, ZHANG Donglin, et al. Research on pedestrian and vehicle detection method based on YOLOv3 in foggy scene[J]. Control Engineering of China, 2024, 31(3): 510–517. doi: 10.14107/j.cnki.kzgc.20211118.
    [9] 胡丹丹, 张忠婷. 基于改进YOLOv5s的面向自动驾驶场景的道路目标检测算法[J]. 智能系统学报, 2024, 19(3): 653–660. doi: 10.11992/tis.202206034.

    HU Dandan and ZHANG Zhongting. Road target detection algorithm for autonomous driving scenarios based on improved YOLOv5s[J]. CAAI Transactions on Intelligent Systems, 2024, 19(3): 653–660. doi: 10.11992/tis.202206034.
    [10] WANG Haotian, ZHAO Yinghai, and GAO Fan. A convolutional neural network accelerator based on FPGA for buffer optimization[C]. Proceedings of 2021 IEEE 5th Advanced Information Technology, Electronic and Automation Control Conference (IAEAC), Chongqing, China, 2021: 2362–2367. doi: 10.1109/IAEAC50856.2021.9390606.
    [11] ZHAO Sijie, GAO Shangshang, WANG Rugang, et al. Acceleration and implementation of convolutional neural networks based on FPGA[J]. Digital Signal Processing, 2023, 141: 104188. doi: 10.1016/j.dsp.2023.104188.
    [12] KIM M, OH K, CHO Y, et al. A low-latency FPGA accelerator for YOLOv3-tiny with flexible layerwise mapping and dataflow[J]. IEEE Transactions on Circuits and Systems I: Regular Papers, 2024, 71(3): 1158–1171. doi: 10.1109/TCSI.2023.3335949.
    [13] 刘谦, 王林林, 周文勃. 基于FPGA的YOLOv5s网络高效卷积加速器设计[J]. 电讯技术, 2024, 64(3): 366–375. doi: 10.20079/j.issn.1001-893x.230216003.

    LIU Qian, WANG Linlin, and ZHOU Wenbo. Design of a YOLOv5s network efficient convolution accelerator powered by FPGA[J]. Telecommunication Engineering, 2024, 64(3): 366–375. doi: 10.20079/j.issn.1001-893x.230216003.
    [14] 包振山, 郭俊南, 张文博, 等. UltraAcc: 基于FPGA流水架构的低功耗高性能CNN加速器定制设计[J]. 计算机学报, 2023, 46(6): 1139–1155. doi: 10.11897/SP.J.1016.2023.01139.

    BAO Zhenshan, GUO Junnan, ZHANG Wenbo, et al. UltraAcc: A customized low power and high performance CNN accelerator with dataflow on FPGAs[J]. Chinese Journal of Computers, 2023, 46(6): 1139–1155. doi: 10.11897/SP.J.1016.2023.01139.
    [15] WANG C Y, BOCHKOVSKIY A, and LIAO H Y M. Scaled-YOLOv4: Scaling cross stage partial network[C]. Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, USA, 2021: 13024–13033. doi: 10.1109/CVPR46437.2021.01283.
    [16] JOCHER G, STOKEN A, BOROVEC J, et al. ultralytics/YOLOv5: V4.0 - nn. SiLU() activations, Weights & Biases logging, PyTorch Hub integration[Z]. 2021. doi: 10.5281/ZENODO.4418161. (查阅网上资料,不确定本条文献类型与格式,请确认).
    [17] WANG C Y, BOCHKOVSKIY A, and LIAO H Y M. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors[C]. Proceedings of 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, Canada, 2022: 7464–7475. doi: 10.1109/CVPR52729.2023.00721.
    [18] SANDLER M, HOWARD A, ZHU Menglong, et al. MobileNetV2: Inverted residuals and linear bottlenecks[C]. Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, USA, 2018: 4510–4520. doi: 10.1109/CVPR.2018.00474.
    [19] LIU Yichao, SHAO Zongru, TENG Yueyang, et al. NAM: Normalization-based attention module[EB/OL]. https://arxiv.org/abs/2111.12419, 2021.
    [20] ZHANG Xiangyu, ZHOU Xinyu, LIN Mengxiao, et al. ShuffleNet: An extremely efficient convolutional neural network for mobile devices[C]. Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, USA, 2018: 6848–6856. doi: 10.1109/CVPR.2018.00716.
    [21] TAN Mingxing and LE Q. EfficientNet: Rethinking model scaling for convolutional neural networks[C]. Proceedings of the 36th International Conference on Machine Learning, Long Beach, USA, 2019: 6105–6114.
    [22] HAN Kai, WANG Yunhe, TIAN Qi, et al. GhostNet: More features from cheap operations[C]. Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, USA, 2020: 1577–1586. doi: 10.1109/CVPR42600.2020.00165.
    [23] HOWARD A G, ZHU Menglong, CHEN Bo, et al. MobileNets: Efficient convolutional neural networks for mobile vision applications[EB/OL]. https://arxiv.org/abs/1704.04861, 2017.
    [24] HU Jie, SHEN Li, SUN Gang, et al. Squeeze-and-excitation networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020, 42(8): 2011–2023. doi: 10.1109/TPAMI.2019.2913372.
    [25] WANG Qilong, WU Banggu, ZHU Pengfei, et al. ECA-Net: Efficient channel attention for deep convolutional neural networks[C]. Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, USA, 2020: 11531–11539. doi: 10.1109/CVPR42600.2020.01155.
    [26] WOO S, PARK J, LEE J Y, et al. CBAM: Convolutional block attention module[C]. Proceedings of the 15th European Conference on Computer Vision, Munich, Germany, 2018: 3–19. doi: 10.1007/978-3-030-01234-2_1.
    [27] CORDTS M, OMRAN M, RAMOS S, et al. The cityscapes dataset for semantic urban scene understanding[C]. Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, USA, 2016: 3213–3223. doi: 10.1109/CVPR.2016.350.
    [28] LIU Yulin, LI Gang, HAO Liguo, et al. Research on a lightweight panoramic perception algorithm for electric autonomous mini-buses[J]. World Electric Vehicle Journal, 2023, 14(7): 179. doi: 10.3390/wevj14070179.
    [29] 任仕伟, 刘朝钾, 李剑铮, 等. 面向端到端目标检测神经网络的高效硬件加速系统设计[J]. 北京理工大学学报, 2022, 42(12): 1312–1320. doi: 10.15918/j.tbit1001-0645.2022.004.

    REN Shiwei, LIU Chaojia, LI Jianzheng, et al. Efficient hardware acceleration system design for end-to-end object detection neural network[J]. Transactions of Beijing Institute of Technology, 2022, 42(12): 1312–1320. doi: 10.15918/j.tbit1001-0645.2022.004.
  • 加载中
图(14) / 表(5)
计量
  • 文章访问数:  87
  • HTML全文浏览量:  31
  • PDF下载量:  20
  • 被引次数: 0
出版历程
  • 收稿日期:  2025-01-22
  • 修回日期:  2025-06-30
  • 网络出版日期:  2025-07-04

目录

    /

    返回文章
    返回