高级搜索

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

利用参数稀疏性的卷积神经网络计算优化及其FPGA加速器设计

刘勤让 刘崇阳

刘勤让, 刘崇阳. 利用参数稀疏性的卷积神经网络计算优化及其FPGA加速器设计[J]. 电子与信息学报, 2018, 40(6): 1368-1374. doi: 10.11999/JEIT170819
引用本文: 刘勤让, 刘崇阳. 利用参数稀疏性的卷积神经网络计算优化及其FPGA加速器设计[J]. 电子与信息学报, 2018, 40(6): 1368-1374. doi: 10.11999/JEIT170819
LIU Qinrang, LIU Chongyang. Calculation Optimization for Convolutional Neural Networks and FPGA-based Accelerator Design Using the Parameters Sparsity[J]. Journal of Electronics & Information Technology, 2018, 40(6): 1368-1374. doi: 10.11999/JEIT170819
Citation: LIU Qinrang, LIU Chongyang. Calculation Optimization for Convolutional Neural Networks and FPGA-based Accelerator Design Using the Parameters Sparsity[J]. Journal of Electronics & Information Technology, 2018, 40(6): 1368-1374. doi: 10.11999/JEIT170819

利用参数稀疏性的卷积神经网络计算优化及其FPGA加速器设计

doi: 10.11999/JEIT170819
基金项目: 

国家科技重大专项(2016ZX01012101),国家自然科学基金(61572520, 61521003)

Calculation Optimization for Convolutional Neural Networks and FPGA-based Accelerator Design Using the Parameters Sparsity

Funds: 

The National Science and Technology Major Project of the Ministry of Science and Technology of China (2016ZX01012101), The National Natural Science Foundation of China (61572520, 61521003)

  • 摘要: 针对卷积神经网络(CNN)在嵌入式端的应用受实时性限制的问题,以及CNN卷积计算中存在较大程度的稀疏性的特性,该文提出一种基于FPGA的CNN加速器实现方法来提高计算速度。首先,挖掘出CNN卷积计算的稀疏性特点;其次,为了用好参数稀疏性,把CNN卷积计算转换为矩阵相乘;最后,提出基于FPGA的并行矩阵乘法器的实现方案。在Virtex-7 VC707 FPGA上的仿真结果表明,相比于传统的CNN加速器,该设计缩短了19%的计算时间。通过稀疏性来简化CNN计算过程的方式,不仅能在FPGA实现,也能迁移到其他嵌入式端。
  • 曾毅, 刘成林, 谭铁牛. 类脑智能研究的回顾与展望[J]. 计算机学报, 2016, 39(1): 212-222. doi: 10.11897/SP.J.1016.2016. 00212.
    ZENG Yi, LIU Chenglin, and TAN Tieniu. Retrospect and outlook of brain-inspired intelligence research[J]. Chinese Journal of Computers, 2016, 39(1): 212-222. doi: 10.11897/ SP.J.1016.2016.00212.
    常亮, 邓小明, 周明全, 等. 图像理解中的卷积神经网络[J]. 自动化学报, 2016, 42(9): 1300-1312. doi: 10.16383/j.aas. 2016.c150800.
    CHANG Liang, DENG Xiaoming, ZHOU Mingquan, et al. Convolutional neural networks in image understanding[J]. Acta Automatica Sinica, 2016, 42(9): 1300-1312. doi: 10.16383/j.aas.2016.c150800.
    JI S, XU W, YANG M, et al. 3D convolutional neural networks for human action recognition[J]. IEEE Transactions on Pattern Analysis Machine Intelligence, 2012, 35(1): 221-231. doi: 10.1109/TPAMI.2012.59.
    CHAKRADHAR S, SANKARADAS M, JAKKULA V, et al. A dynamically configurable coprocessor for convolutional neural networks[J]. ACM Sigarch Computer Architecture News, 2010, 38(3): 247-257. doi: 10.1145/1816038.1815993.
    KRIZHEVSKY A, SUTSKEVER I, and HINTON G E. ImageNet classification with deep convolutional neural networks[C]. International Conference on Neural Information Processing Systems, Lake Tahoe, Nevada, 2012: 1097-1105. doi: 10.1145/3065386.
    SUDA N, CHANDRA V, DASIKA G, et al. Throughput- optimized openCL-based FPGA accelerator for large-scale convolutional neural networks[C]. ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, Monterey, California, USA, 2016: 16-25. doi: 10.1145/2847263.2847276.
    QIU J, WANG J, YAO S, et al. Going deeper with embedded FPGA platform for convolutional neural network[C]. ACM/ SIGDA International Symposium on Field-Programmable Gate Arrays, Monterey, California, USA, 2016: 26-35. doi: 10.1145 /2847263.2847265.
    ANWAR S, HWANG K, and SUNG W. Fixed point optimization of deep convolutional neural networks for object recognition[C]. IEEE International Conference on Acoustics, Speech and Signal Processing, Brisbane, QLD, Australia, 2015: 1131-1135. doi: 10.1109/ICASSP.2015.7178146.
    ZHANG C, LI P, SUN G, et al. Optimizing FPGA-based accelerator design for deep convolutional neural networks[C]. ACM/SIGDA International Symposium on Field- Programmable Gate Arrays, Monterey, California, USA, 2015: 161-170. doi: 10.1145/2684746.2689060.
    SHEN Y, FERDMAN M, and MILDER P. Maximizing CNN accelerator effciency through resource partitioning[C]. Annual International Symposium on Computer Architecture, Toronto, ON, Canada, 2017: 535-547. doi: 10.1145/3140659. 3080221.
    DU Z, FASTHUBER R, CHEN T, et al. ShiDianNao: Shifting vision processing closer to the sensor[C]. Annual International Symposium on Computer Architecture, Portland, Oregon, 2015: 92-104. doi: 10.1145/2749469. 2750389.
    CHEN T, DU Z, SUN N, et al. DianNao: A small-footprint high-throughput accelerator for ubiquitous machine- learning[C]. International Conference on Architectural Support for Programming Languages and Operating Systems, Salt Lake City, Utah, USA, 2014: 269-284. doi: 10.1145/ 2541940.2541967.
    HADJIS S, ABUZAID F, ZHANG C, et al. Caffe con troll: Shallow ideas to speed up deep learning[C]. Proceedings of the Fourth Workshop on Data analytics, Melbourne, VIC, Australia, 2015: 1-4. doi: 10.1145/2799562.2799641.
    YAVITS L, MORAD A, and GINOSAR R. Sparse matrix multiplication on an associative processor[J]. IEEE Transactions on Parallel Distributed Systems, 2015, 26(11): 3175-3183. doi: 10.1109/TPDS.2014.2370055.
    CHELLAPILLA K, PURI S, and SIMARD P. High performance convolutional neural networks for document processing[C]. Tenth International Workshop on Frontiers in Handwriting Recognition, La Baule, France, 2006: 1-6.
    CHETLUR S, WOOLLEY C, VANDERMERSCH P, et al. CuDNN: Efficient primitives for deep learning[C]. International Conference on Neural Information Processing Systems, Montreal, Canada, 2014: 1-9.
    田翔, 周凡, 陈耀武, 等. 基于FPGA的实时双精度浮点矩阵乘法器设计[J]. 浙江大学学报(工学版), 2008, 42(9): 1611-1615. doi: 10.3785/j.issn.1008-973X.2008.09.027.
    TIAN Xiang, ZHOU Fan, CHEN Yaowu, et al. Design of field programmable gate array based real-time double-precision floating-point matrix multiplier[J]. Journal of Zhejiang University (Engineering Science), 2008, 42(9): 1611-1615. doi: 10.3785/j.issn.1008-973X.2008.09.027.
    JANG J, CHOI S B, and PRASANNA V K. Energy- and time-efficient matrix multiplication on FPGAs[J]. IEEE Transactions on Very Large Scale Integration Systems, 2005, 13(11): 1305-1319. doi: 10.1109/TVLSI.2005.859562.
    KUMAR V B Y, JOSHI S, PATKAR S B, et al. FPGA based high performance double-precision matrix multiplication[J]. International Journal of Parallel Programming, 2010, 38(3/4): 322-338. doi: 10.1109/VLSI.Design.2009.13.
    DONAHUE J, JIA Y, VINYALS O, et al. DeCAF: A deep convolutional activation feature for generic visual recognition[C]. International Conference on Machine Learning, Beijing, China, 2014: 647-655.
    PETE Warden. Why GEMM is at the heart of deep learning[OL]. https://petewarden.com/2015/04/20/ why-gemm-is-at-the-heart-of-deep-learning/.
  • 加载中
计量
  • 文章访问数:  2343
  • HTML全文浏览量:  387
  • PDF下载量:  445
  • 被引次数: 0
出版历程
  • 收稿日期:  2017-08-21
  • 修回日期:  2018-01-05
  • 刊出日期:  2018-06-19

目录

    /

    返回文章
    返回