Citation: | HE Yukai, XIE Tongxin, ZHU Zhenhua, GAO Lan, LI Bing. Collaborative Optimization Strategies for Matrix Multiplication-Accumulation Operators on Commercial Processing-In-Memory Architectures[J]. Journal of Electronics & Information Technology. doi: 10.11999/JEIT250364 |
[1] |
GHOLAMI A, YAO Zhewei, KIM S, et al. AI and memory wall[J]. IEEE Micro, 2024, 44(3): 33–39. doi: 10.1109/MM.2024.3373763.
|
[2] |
CHI Ping, LI Shuangchen, XU Cong, et al. PRIME: A novel processing-in-memory architecture for neural network computation in ReRAM-based main memory[J]. ACM SIGARCH Computer Architecture News, 2016, 44(3): 27–39. doi: 10.1145/3007787.3001140.
|
[3] |
DAKKAK A, LI Cheng, XIONG Jinjun, et al. Accelerating reduction and scan using tensor core units[C]. Proceedings of the ACM International Conference on Supercomputing, Phoenix, USA, 2019: 46–57. doi: 10.1145/3330345.3331057.
|
[4] |
ALIAN M, MIN S W, ASGHARIMOGHADDAM H, et al. Application-transparent near-memory processing architecture with memory channel network[C]. Proceedings of the 51st Annual IEEE/ACM International Symposium on Microarchitecture, Fukuoka, Japan, 2018: 802–814. doi: 10.1109/MICRO.2018.00070.
|
[5] |
GÓMEZ-LUNA J, EL HAJJ I, FERNANDEZ I, et al. Benchmarking a new paradigm: An experimental analysis of a real processing-in-memory architecture[J]. arXiv: 2105.03814, 2021: 1–25. doi: 10.48550/arXiv.2105.03814. (查阅网上资料,请核对文献类型及格式).
|
[6] |
KIM J and KIM Y. HBM: Memory solution for bandwidth-hungry processors[C]. Proceedings of 2014 IEEE Hot Chips 26 Symposium, Cupertino, USA, 2014: 1–24. doi: 10.1109/HOTCHIPS.2014.7478812.
|
[7] |
LEE S, KANG S H, LEE J, et al. Hardware architecture and software stack for PIM based on commercial DRAM technology: Industrial product[C]. Proceedings of 2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture, Valencia, Spain, 2021: 43–56. doi: 10.1109/ISCA52012.2021.00013.
|
[8] |
KIM D, KUNG J, CHAI S, et al. Neurocube: A programmable digital neuromorphic architecture with high-density 3D memory[J]. ACM SIGARCH Computer Architecture News, 2016, 44(3): 380–392. doi: 10.1145/3007787.3001178.
|
[9] |
ASGARI B, HADIDI R, CAO Jiashen, et al. FAFNIR: Accelerating sparse gathering by using efficient near-memory intelligent reduction[C]. Proceedings of 2021 IEEE International Symposium on High-Performance Computer Architecture, Seoul, Korea (South), 2021: 908–920. doi: 10.1109/HPCA51647.2021.00080.
|
[10] |
WANG Haoyang, ZHANG Shengbing, FAN Xiaoya, et al. NDPGNN: A near-data processing architecture for GNN training and inference acceleration[J]. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2024, 43(11): 3997–4008. doi: 10.1109/TCAD.2024.3446871.
|
[11] |
LEE S, KIM K, OH S, et al. A 1ynm 1.25V 8Gb, 16Gb/s/pin GDDR6-based accelerator-in-memory supporting 1TFLOPS MAC operation and various activation functions for deep-learning applications[C]. Proceedings of 2022 IEEE International Solid-State Circuits Conference, San Francisco, USA, 2022: 1–3. doi: 10.1109/ISSCC42614.2022.9731711.
|
[12] |
DAI Guohao, ZHU Zhenhua, FU Tianyu, et al. DIMMining: Pruning-efficient and parallel graph mining on near-memory-computing[C]. Proceedings of the 49th Annual International Symposium on Computer Architecture, New York, USA, 2022: 130–145. doi: 10.1145/3470496.3527388.
|
[13] |
KE Liu, GUPTA U, CHO B Y, et al. RecNMP: Accelerating personalized recommendation with near-memory processing[C]. Proceedings of 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture, Valencia, Spain, 2020: 790–803. doi: 10.1109/ISCA45697.2020.00070.
|
[14] |
WILKINSON F, COCKREAN A, LIN Weichen, et al. Assessing the GPU offload threshold of GEMM and GEMV kernels on modern heterogeneous HPC systems[C]. Proceedings of the SC24-W: Workshops of the International Conference for High Performance Computing, Networking, Storage and Analysis, Atlanta, USA, 2024: 1481–1495. doi: 10.1109/SCW63240.2024.00188.
|
[15] |
HONG Ke, DAI Guohao, XU Jiaming, et al. FlashDecoding++: Faster large language model inference on GPUs[C]. Proceedings of the 41st International Conference on Machine Learning (ICML), Vienna, Austria, 2024: 1–15. (查阅网上资料, 未找到本条文献信息, 请确认).
|
[16] |
IBRAHIM M A, ISLAM M, and AGA S. PIMnast: Balanced data placement for GEMV acceleration with processing-in-memory[C]. Proceedings of the SC24-W: Workshops of the International Conference for High Performance Computing, Networking, Storage and Analysis, Atlanta, USA, 2024: 970–981. doi: 10.1109/SCW63240.2024.00137.
|
[17] |
XIE Tongxin, ZHU Zhenhua, LI Bing, et al. UniNDP: A unified compilation and simulation tool for near DRAM processing architectures[C]. Proceedings of 2025 IEEE International Symposium on High Performance Computer Architecture, Las Vegas, USA, 2025: 624–640. doi: 10.1109/HPCA61900.2025.00054.
|
[18] |
Samsung Advanced Institute of Technology. PIMSimulator[EB/OL]. https://github.com/SAITPublic/PIMSimulator. (查阅网上资料,未找到对应的作者和年份信息,请确认).
|