Citation: | LIU Qinrang, LIU Chongyang. Calculation Optimization for Convolutional Neural Networks and FPGA-based Accelerator Design Using the Parameters Sparsity[J]. Journal of Electronics & Information Technology, 2018, 40(6): 1368-1374. doi: 10.11999/JEIT170819 |
曾毅, 刘成林, 谭铁牛. 类脑智能研究的回顾与展望[J]. 计算机学报, 2016, 39(1): 212-222. doi: 10.11897/SP.J.1016.2016. 00212.
|
ZENG Yi, LIU Chenglin, and TAN Tieniu. Retrospect and outlook of brain-inspired intelligence research[J]. Chinese Journal of Computers, 2016, 39(1): 212-222. doi: 10.11897/ SP.J.1016.2016.00212.
|
常亮, 邓小明, 周明全, 等. 图像理解中的卷积神经网络[J]. 自动化学报, 2016, 42(9): 1300-1312. doi: 10.16383/j.aas. 2016.c150800.
|
CHANG Liang, DENG Xiaoming, ZHOU Mingquan, et al. Convolutional neural networks in image understanding[J]. Acta Automatica Sinica, 2016, 42(9): 1300-1312. doi: 10.16383/j.aas.2016.c150800.
|
JI S, XU W, YANG M, et al. 3D convolutional neural networks for human action recognition[J]. IEEE Transactions on Pattern Analysis Machine Intelligence, 2012, 35(1): 221-231. doi: 10.1109/TPAMI.2012.59.
|
CHAKRADHAR S, SANKARADAS M, JAKKULA V, et al. A dynamically configurable coprocessor for convolutional neural networks[J]. ACM Sigarch Computer Architecture News, 2010, 38(3): 247-257. doi: 10.1145/1816038.1815993.
|
KRIZHEVSKY A, SUTSKEVER I, and HINTON G E. ImageNet classification with deep convolutional neural networks[C]. International Conference on Neural Information Processing Systems, Lake Tahoe, Nevada, 2012: 1097-1105. doi: 10.1145/3065386.
|
SUDA N, CHANDRA V, DASIKA G, et al. Throughput- optimized openCL-based FPGA accelerator for large-scale convolutional neural networks[C]. ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, Monterey, California, USA, 2016: 16-25. doi: 10.1145/2847263.2847276.
|
QIU J, WANG J, YAO S, et al. Going deeper with embedded FPGA platform for convolutional neural network[C]. ACM/ SIGDA International Symposium on Field-Programmable Gate Arrays, Monterey, California, USA, 2016: 26-35. doi: 10.1145 /2847263.2847265.
|
ANWAR S, HWANG K, and SUNG W. Fixed point optimization of deep convolutional neural networks for object recognition[C]. IEEE International Conference on Acoustics, Speech and Signal Processing, Brisbane, QLD, Australia, 2015: 1131-1135. doi: 10.1109/ICASSP.2015.7178146.
|
ZHANG C, LI P, SUN G, et al. Optimizing FPGA-based accelerator design for deep convolutional neural networks[C]. ACM/SIGDA International Symposium on Field- Programmable Gate Arrays, Monterey, California, USA, 2015: 161-170. doi: 10.1145/2684746.2689060.
|
SHEN Y, FERDMAN M, and MILDER P. Maximizing CNN accelerator effciency through resource partitioning[C]. Annual International Symposium on Computer Architecture, Toronto, ON, Canada, 2017: 535-547. doi: 10.1145/3140659. 3080221.
|
DU Z, FASTHUBER R, CHEN T, et al. ShiDianNao: Shifting vision processing closer to the sensor[C]. Annual International Symposium on Computer Architecture, Portland, Oregon, 2015: 92-104. doi: 10.1145/2749469. 2750389.
|
CHEN T, DU Z, SUN N, et al. DianNao: A small-footprint high-throughput accelerator for ubiquitous machine- learning[C]. International Conference on Architectural Support for Programming Languages and Operating Systems, Salt Lake City, Utah, USA, 2014: 269-284. doi: 10.1145/ 2541940.2541967.
|
HADJIS S, ABUZAID F, ZHANG C, et al. Caffe con troll: Shallow ideas to speed up deep learning[C]. Proceedings of the Fourth Workshop on Data analytics, Melbourne, VIC, Australia, 2015: 1-4. doi: 10.1145/2799562.2799641.
|
YAVITS L, MORAD A, and GINOSAR R. Sparse matrix multiplication on an associative processor[J]. IEEE Transactions on Parallel Distributed Systems, 2015, 26(11): 3175-3183. doi: 10.1109/TPDS.2014.2370055.
|
CHELLAPILLA K, PURI S, and SIMARD P. High performance convolutional neural networks for document processing[C]. Tenth International Workshop on Frontiers in Handwriting Recognition, La Baule, France, 2006: 1-6.
|
CHETLUR S, WOOLLEY C, VANDERMERSCH P, et al. CuDNN: Efficient primitives for deep learning[C]. International Conference on Neural Information Processing Systems, Montreal, Canada, 2014: 1-9.
|
田翔, 周凡, 陈耀武, 等. 基于FPGA的实时双精度浮点矩阵乘法器设计[J]. 浙江大学学报(工学版), 2008, 42(9): 1611-1615. doi: 10.3785/j.issn.1008-973X.2008.09.027.
|
TIAN Xiang, ZHOU Fan, CHEN Yaowu, et al. Design of field programmable gate array based real-time double-precision floating-point matrix multiplier[J]. Journal of Zhejiang University (Engineering Science), 2008, 42(9): 1611-1615. doi: 10.3785/j.issn.1008-973X.2008.09.027.
|
JANG J, CHOI S B, and PRASANNA V K. Energy- and time-efficient matrix multiplication on FPGAs[J]. IEEE Transactions on Very Large Scale Integration Systems, 2005, 13(11): 1305-1319. doi: 10.1109/TVLSI.2005.859562.
|
KUMAR V B Y, JOSHI S, PATKAR S B, et al. FPGA based high performance double-precision matrix multiplication[J]. International Journal of Parallel Programming, 2010, 38(3/4): 322-338. doi: 10.1109/VLSI.Design.2009.13.
|
DONAHUE J, JIA Y, VINYALS O, et al. DeCAF: A deep convolutional activation feature for generic visual recognition[C]. International Conference on Machine Learning, Beijing, China, 2014: 647-655.
|
PETE Warden. Why GEMM is at the heart of deep learning[OL]. https://petewarden.com/2015/04/20/ why-gemm-is-at-the-heart-of-deep-learning/.
|