Advanced Search
Volume 46 Issue 4
Apr.  2024
Turn off MathJax
Article Contents
LI Wei, CHEN Yi, CHEN Tao, NAN Longmei, DU Yiran. A Research and Design of Reconfigurable CNN Co-Processor for Edge Computing[J]. Journal of Electronics & Information Technology, 2024, 46(4): 1499-1512. doi: 10.11999/JEIT230509
Citation: LI Wei, CHEN Yi, CHEN Tao, NAN Longmei, DU Yiran. A Research and Design of Reconfigurable CNN Co-Processor for Edge Computing[J]. Journal of Electronics & Information Technology, 2024, 46(4): 1499-1512. doi: 10.11999/JEIT230509

A Research and Design of Reconfigurable CNN Co-Processor for Edge Computing

doi: 10.11999/JEIT230509
Funds:  The Fundamental Enhancement Program Focused Essential Research Projects (2019-JCJQ-ZD-187-00-02)
  • Received Date: 2023-05-29
  • Rev Recd Date: 2023-12-04
  • Available Online: 2023-12-25
  • Publish Date: 2024-04-24
  • With the development of Deep Learning, the number of parameters and computation of Convolutional Neural Network (CNN) increases dramatically, which greatly raises the cost of deploying CNN algorithms on edge devices. To reduce the difficulty of the deployment and decrease the inference latency and energy consumption of CNN on the edge side, a Reconfigurable CNN Co-Processor for edge computing is proposed. Based on the data flow pattern of channel-wise processing, the proposed two-level distributed storage scheme solves the problem of power consumption overhead and performance degradation caused by large data movement between PE units and large-scale migration of intermediate data on chip. To avoid the complex data interconnection network propagation mechanism in PE arrays and reduce the complexity of control, a flexible local access mechanism and a padding mechanism based on address translation are proposed, which can perform conventional convolution, deep separable convolution, pooling and fully connected operations with great flexibility. The proposed co-processor contains 256 Processing Elements (PEs) and 176 kB on-chip SRAM. Synthesized and post-layout with 55-nm TT Corner CMOS process (25 °C, 1.2 V), the CNN co-processor achieves a maximum clock frequency of 328 MHz and an area of 4.41 mm2. The peak performance of the co-processor is 163.8 GOPs at 320 MHz and the area efficiency is 37.14 GOPs/mm2, the energy efficiency of LeNet-5 and MobileNet are 210.7 GOPs/W and 340.08 GOPs/W, respectively, which is able to meet the energy-efficiency and performance requirements of edge intelligent computing scenarios.
  • loading
  • [1]
    FIROUZI F, FARAHANI B, and MARINŠEK A. The convergence and interplay of edge, fog, and cloud in the AI-driven Internet of Things (IoT)[J]. Information Systems, 2022, 107: 101840. doi: 10.1016/j.is.2021.101840.
    [2]
    ALAM F, ALMAGHTHAWI A, KATIB I, et al. IResponse: An AI and IoT-enabled framework for autonomous COVID-19 pandemic management[J]. Sustainability, 2021, 13(7): 3797. doi: 10.3390/su13073797.
    [3]
    CHAUDHARY V, KAUSHIK A, FURUKAWA H, et al. Review-Towards 5th generation AI and IoT driven sustainable intelligent sensors based on 2D MXenes and borophene[J]. ECS Sensors Plus, 2022, 1(1): 013601. doi: 10.1149/2754-2726/ac5ac6.
    [4]
    KRIZHEVSKY A, SUTSKEVER I, and HINTON G E. Imagenet classification with deep convolutional neural networks[J]. Communications of the ACM, 2017, 60(6): 84–90. doi: 10.1145/3065386.
    [5]
    LU Wenyan, YAN Guihai, LI Jiajun, et al. FlexFlow: A flexible dataflow accelerator architecture for convolutional neural networks[C]. 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA), Austin, USA, 2017: 553–564. doi: 10.1109/HPCA.2017.29.
    [6]
    PARK J S, PARK C, KWON S, et al. A multi-mode 8k-MAC HW-utilization-aware neural processing unit with a unified multi-precision Datapath in 4-nm flagship mobile SoC[J]. IEEE Journal of Solid-State Circuits, 2023, 58(1): 189–202. doi: 10.1109/JSSC.2022.3205713.
    [7]
    GOKHALE V, JIN J, DUNDAR A, et al. A 240 G-ops/s mobile coprocessor for deep neural networks[C]. IEEE Conference on Computer Vision and Pattern Recognition Workshops, Columbus, USA, 2014: 682–687. doi: 10.1109/CVPRW.2014.106.
    [8]
    DU Zidong, FASTHUBER R, CHEN Tianshi, et al. ShiDianNao: Shifting vision processing closer to the sensor[C]. 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA), Portland, USA, 2015: 92–104. doi: 10.1145/2749469.2750389.
    [9]
    ZHANG Chen, LI Peng, SUN Guangyu, et al. Optimizing FPGA-based accelerator design for deep convolutional neural networks[C]. The 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, Monterey, USA, 2015: 161–170. doi: 10.1145/2684746.2689060.
    [10]
    CHEN Y H, KRISHNA T, EMER J S, et al. Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks[J]. IEEE Journal of Solid-State Circuits, 2017, 52(1): 127–138. doi: 10.1109/JSSC.2016.2616357.
    [11]
    HOWARD A G, ZHU Menglong, CHEN Bo, et al. MobileNets: Efficient convolutional neural networks for mobile vision applications[EB/OL]. https://arxiv.org/abs/1704.04861, 2017.
    [12]
    DING Caiwen, WANG Shuo, LIU Ning, et al. REQ-YOLO: A resource-aware, efficient quantization framework for object detection on FPGAs[C]. The 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, Seaside, USA, 2019: 33–42. doi: 10.1145/3289602.3293904.
    [13]
    LE M Q, NGUYEN Q T, DAO V H, et al. CNN quantization for anatomical landmarks classification from upper gastrointestinal endoscopic images on Edge devices[C]. 2022 IEEE Ninth International Conference on Communications and Electronics (ICCE), Nha Trang, Vietnam, 2022: 389–394. doi: 10.1109/ICCE55644.2022.9852098.
    [14]
    KWAK J, KIM K, LEE S S, et al. Quantization aware training with order strategy for CNN[C]. 2022 IEEE International Conference on Consumer Electronics-Asia (ICCE-Asia), Yeosu, Republic of Korea, 2022: 1–3. doi: 10.1109/ICCE-Asia57006.2022.9954693.
    [15]
    JACOB B, KLIGYS S, CHEN Bo, et al. Quantization and training of neural networks for efficient integer-arithmetic-only inference[C]. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, USA, 2018: 2704–2713. doi: 10.1109/CVPR.2018.00286.
    [16]
    LECUN Y, BOTTOU L, BENGIO Y, et al. Gradient-based learning applied to document recognition[J]. Proceedings of the IEEE, 1998, 86(11): 2278–2324. doi: 10.1109/5.726791.
    [17]
    JO J, CHA S, RHO D, et al. DSIP: A scalable inference accelerator for convolutional neural networks[J]. IEEE Journal of Solid-State Circuits, 2018, 53(2): 605–618. doi: 10.1109/JSSC.2017.2764045.
    [18]
    ARDAKANI A, CONDO C, and GROSS W J. Fast and efficient convolutional accelerator for edge computing[J]. IEEE Transactions on Computers, 2020, 69(1): 138–152. doi: 10.1109/TC.2019.2941875.
    [19]
    AHMADI M, VAKILI S, and LANGLOIS J M P. CARLA: A convolution accelerator with a reconfigurable and low-energy architecture[J]. IEEE Transactions on Circuits and Systems I:Regular Papers, 2021, 68(8): 3184–3196. doi: 10.1109/TCSI.2021.3066967.
    [20]
    LU Yi, WU Yilin and HUANG J D. A coarse-grained dual-convolver based CNN accelerator with high computing resource utilization[C]. 2020 2nd IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS), Genova, Italy, 2020: 198–202. doi: 10.1109/AICAS48895.2020.9073835.
    [21]
    HUANG Boming, HUAN Yuxiang, CHU Haoming, et al. IECA: An in-execution configuration CNN accelerator with 30.55 GOPS/mm² area efficiency[J]. IEEE Transactions on Circuits and Systems I:Regular Papers, 2021, 68(11): 4672–4685. doi: 10.1109/TCSI.2021.3108762.
    [22]
    CHEN Y H, YANG T J, EMER J, et al. Eyeriss v2: A flexible accelerator for emerging deep neural networks on mobile devices[J]. IEEE Journal on Emerging and Selected Topics in Circuits and Systems, 2019, 9(2): 292–308. doi: 10.1109/JETCAS.2019.2910232.
    [23]
    HOSSAIN M D S and SAVIDIS I. Energy efficient computing with heterogeneous DNN accelerators[C]. 2021 IEEE 3rd International Conference on Artificial Intelligence Circuits and Systems (AICAS), Washington, USA, 2021: 1–4. doi: 10.1109/AICAS51828.2021.9458474.
  • 加载中

Catalog

    通讯作者: 陈斌, bchen63@163.com
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Figures(18)  / Tables(5)

    Article Metrics

    Article views (348) PDF downloads(68) Cited by()
    Proportional views
    Related

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return