A System-level Exploration and Evaluation Simulator for chiplet-based CPU

ZHANG Congwu; LIU Ao; ZHANG Ke; CHANG Yisong; BAO Yungang

doi:10.11999/JEIT240299

Volume 46 Issue 12

Dec. 2024

Turn off MathJax

Article Contents

Article Navigation > Journal of Electronics & Information Technology > 2024 > 46(12): 4575-4588

ZHANG Congwu, LIU Ao, ZHANG Ke, CHANG Yisong, BAO Yungang. A System-level Exploration and Evaluation Simulator for chiplet-based CPU[J]. Journal of Electronics & Information Technology, 2024, 46(12): 4575-4588. doi: 10.11999/JEIT240299

Citation:

ZHANG Congwu, LIU Ao, ZHANG Ke, CHANG Yisong, BAO Yungang. A System-level Exploration and Evaluation Simulator for chiplet-based CPU[J]. Journal of Electronics & Information Technology, 2024, 46(12): 4575-4588. doi: 10.11999/JEIT240299

Citation:

PDF( 4968 KB)

A System-level Exploration and Evaluation Simulator for chiplet-based CPU

doi: 10.11999/JEIT240299 cstr: 32379.14.JEIT240299

ZHANG Congwu^{1, 2},
LIU Ao^{1, 3},
ZHANG Ke^{1, 2
,
,},
CHANG Yisong^{1, 2},
BAO Yungang^{1, 2}

1.
State Key Lab of Processors, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China
2.
School of Computer Science and Technology, University of Chinese Academy of Sciences, Beijing 100049, China
3.
Henan Institute of Advanced Technology, Zhengzhou University, Zhengzhou 450003, China

Funds: The Strategic Priority Research Program of Chinese Academy of Sciences (XDA0320000, XDA0320300), The MajorProgram of the National Natural Science Foundation of China (62090020)

Received Date: 2024-04-19
Rev Recd Date: 2024-11-11

Available Online: 2024-11-19

Publish Date: 2024-12-01

Abstract

Abstract

As Moore’s Law comes to an end, it is more and more difficult to improve the chip manufacturing process, and chiplet technology has been widely adopted to improve the chip performance. However, new design parameters introduced into the chiplet architecture pose significant challenges to the computer architecture simulator. To fully support exploration and evaluation of chiplet architecture, System-level Exploration and Evaluation simulator for Chiplet (SEEChiplet), a framework based on gem5 simulator, is developed in this paper. Firstly, three design parameters concerned about chiplet chip design are summarized in this paper, including: (1) chiplet cache system design; (2) Packaging simulation; (3) Interconnection networks between chiplet. Secondly, in view of the above three design parameters, in this paper: (1) a new private last level cache system is designed and implemented to expand the cache system design space; (2) existing gem5 global directory is modified to adapt to new private Last Level Cache (LLC) system; (3) two common packaging methods of chiplet and inter-chiplet network are modeled. Finally, a chiplet-based processor is simulated with PARSEC 3.0 benchmark program running on it, which proves that SEEChiplet can explore and evaluate the design space of chiplet.
- Chiplet,
- Design space exploration,
- Computer architecture simulator,
- Cache system

FullText(HTML)

References(42)

References

[1]	MOORE G E. Cramming more components onto integrated circuits[J]. Electronics, 1965, 38(8): 114–117.
[2]	DENNARD R H, GAENSSLE F H, YU H N, et al. Design of ion-implanted MOSFET's with very small physical dimensions[J]. IEEE Journal of Solid-State Circuits, 1974, 9(5): 256–268. doi: 10.1109/JSSC.1974.1050511.
[3]	HAN Yinhe, XU Haobo, LU Meixuan, et al. The big chip: Challenge, model and architecture[J]. Fundamental Research, 2023, S2667325823003709. doi: 10.1016/j.fmre.2023.10.020.
[4]	CAI Jingwei, WU Zuotong, PENG Sen, et al. Gemini: Mapping and architecture co-exploration for large-scale DNN Chiplet accelerators[C]. 2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA), Edinburgh, United Kingdom, 2024: 156–171. doi: 10.1109/HPCA57654.2024.00022.
[5]	陈云霁, 蔡一茂, 汪玉, 等. 集成电路未来发展与关键问题—第347期"双清论坛(青年)"学术综述[J]. 中国科学: 信息科学, 2024, 54(1): 1–15. doi: 10.1360/SSI-2023-0356. CHEN Yunji, CAI Yimao, WANG Yu, et al. Integrated circuit technology: Future development and key issues–review of the 347th "Shuangqing Forum (Youth)"[J]. Scientia Sinica Informationis, 2024, 54(1): 1–15. doi: 10.1360/SSI-2023-0356.
[6]	项少林, 郭茂, 蒲菠, 等. Chiplet技术发展现状[J]. 科技导报, 2023, 41(19): 113–131. doi: 10.3981/j.issn.1000-7857.2023.19.013. XIANG Shaolin, GUO Mao, PU Bo, et al. Overview of the development status of Chiplet technology[J]. Science & Technology Review, 2023, 41(19): 113–131. doi: 10.3981/j.issn.1000-7857.2023.19.013.
[7]	厉佳瑶, 张琨, 潘权. Chiplet技术: 拓展芯片设计的新边界[J]. 集成电路与嵌入式系统, 2024, 24(2): 1–9. LI Jiayao, ZHANG Kun, and PAN Quan. Chiplet: Expanding the innovative boundaries of chip design[J]. Integrated Circuits and Embedded Systems, 2024, 24(2): 1–9.
[8]	MA Xiaohan, WANG Ying, WANG Yujie, et al. Survey on Chiplets: Interface, interconnect and integration methodology[J]. CCF Transactions on High Performance Computing, 2022, 4(1): 43–52. doi: 10.1007/s42514-022-00093-0.
[9]	SUGGS D, SUBRAMONY M, and BOUVIER D. The AMD “Zen 2” processor[J]. IEEE Micro, 2020, 40(2): 45–52. doi: 10.1109/MM.2020.2974217.
[10]	NAFFZIGER S, LEPAK K, PARASCHOU M, et al. 2.2 AMD Chiplet architecture for high-performance server and desktop products[C]. 2020 IEEE International Solid-State Circuits Conference - (ISSCC), San Francisco, USA, 2020: 44–45. doi: 10.1109/ISSCC19947.2020.9063103.
[11]	EVERS M, BARNES L, and CLARK M. The AMD next-generation “Zen 3” Core[J]. IEEE Micro, 2022, 42(3): 7–12. doi: 10.1109/MM.2022.3152788.
[12]	MUNGER B, WILCOX K, SNIDERMAN J, et al. Zen 4: The AMD 5nm 5.7GHz x86-64 microprocessor core[C]. 2023 IEEE International Solid-State Circuits Conference (ISSCC), San Francisco, USA, 2023: 38–39. doi: 10.1109/ISSCC42615.2023.10067540.
[13]	GIANOS C. Architecting for flexibility and value with next gen Intel® Xeon® processors[C]. 2023 IEEE Hot Chips 35 Symposium (HCS), Palo Alto, USA, 2023: 1–15. doi: 10.1109/HCS59251.2023.10254694.
[14]	ESPOSITO B. Intel Agilex® 9 direct RF-series FPGAs with integrated 64 Gsps data converters[C]. 2023 IEEE Hot Chips 35 Symposium (HCS), Palo Alto, USA, 2023: 1–35. doi: 10.1109/HCS59251.2023.10254707.
[15]	VENTANA MICRO. Veyron V1 data center-class RISC-V processor[C]. 2023 IEEE Hot Chips 35 Symposium (HCS), Palo Alto, USA, 2023: 1–16. doi: 10.1109/HCS59251.2023.10254710.
[16]	CHIRKOV G and WENTZLAFF D. Seizing the bandwidth scaling of on-package interconnect in a post-Moore’s law world[C]. Proceedings of the 37th International Conference on Supercomputing, Orlando, USA, 2023: 410–422. doi: 10.1145/3577193.3593702.
[17]	YANG Chongyi, ZHANG Zhendong, WANG Xiaohang, et al. Adaptive caching policies for Chiplet systems based on reinforcement learning[C]. 2023 IEEE International Symposium on Circuits and Systems (ISCAS), Monterey, USA, 2023: 1–5. doi: 10.1109/ISCAS46773.2023.10181966.
[18]	GADE S H, SINHA M, KUMAR M, et al. Scalable hybrid cache coherence using emerging links for Chiplet architectures[C]. 2022 35th International Conference on VLSI Design and 2022 21st International Conference on Embedded Systems (VLSID), Bangalore, India, 2022: 92–97. doi: 10.1109/VLSID2022.2022.00029.
[19]	MEDINA R, KEIN J, ANSALONI G, et al. System-level exploration of in-package wireless communication for multi-Chiplet platforms[C]. Proceedings of the 28th Asia and South Pacific Design Automation Conference, Tokyo, Japan, 2023: 561–566. doi: 10.1145/3566097.3567952.
[20]	ZHU Mingcan, SHAHAB A, KATSARAKIS A, et al. Invalidate or update? Revisiting coherence for tomorrow's cache hierarchies[C]. 2021 30th International Conference on Parallel Architectures and Compilation Techniques (PACT), Atlanta, USA, 2021: 226–241. doi: 10.1109/PACT52795.2021.00024.
[21]	SHAHAB A, ZHU Mingcan, MARGARITOV A, et al. Farewell my shared LLC! A case for private die-stacked DRAM caches for servers[C]. 2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), Fukuoka, Japan, 2018: 559–572. doi: 10.1109/MICRO.2018.00052.
[22]	BHARADWAJ S, YIN Jieming, BECKMANN B, et al. Kite: A family of heterogeneous interposer topologies enabled via accurate interconnect modeling[C]. 2020 57th ACM/IEEE Design Automation Conference (DAC), San Francisco, USA, 2020: 1–6. doi: 10.1109/DAC18072.2020.9218539.
[23]	IFF P, BESTA M, CAVALCANTE M, et al. HexaMesh: Scaling to hundreds of Chiplets with an optimized Chiplet arrangement[C]. 2023 60th ACM/IEEE Design Automation Conference (DAC), San Francisco, USA, 2023: 1–6. doi: 10.1109/DAC56929.2023.10248006.
[24]	FU Yaosheng and WENTZLAFF D. PriME: A parallel and distributed simulator for thousand-core chips[C]. 2014 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), Monterey, USA, 2014: 116–125. doi: 10.1109/ISPASS.2014.6844467.
[25]	LOWE-POWER J, AHMAD A M, AKRAM A, et al. The gem5 simulator: Version 20.0+[EB/OL]. https://arxiv.org/abs/2007.03152, 2020.
[26]	UBAL R, JANG B, MISTRY P, et al. Multi2Sim: A simulation framework for CPU-GPU computing[C]. Proceedings of the 21st International Conference on Parallel Architectures and Compilation Techniques. Minneapolis, USA, 2012: 335–344. doi: 10.1145/2370816.2370865.
[27]	QURESHI Y M, SIMON W A, ZAPATER M, et al. gem5-X: A many-core heterogeneous simulation platform for architectural exploration and optimization[J]. ACM Transactions on Architecture and Code Optimization (TACO), 2021, 18(4): 44. doi: 10.1145/3461662.
[28]	HARDAVELLAS N, SOMOGYI S, WENISCH T F, et al. SimFlex: A fast, accurate, flexible full-system simulation framework for performance evaluation of server architecture[J]. ACM SIGMETRICS Performance Evaluation Review, 2004, 31(4): 31–34. doi: 10.1145/1054907.1054914.
[29]	JIANG Nan, BECKER U D, MICHELOGIANNAKIS G, et al. A detailed and flexible cycle-accurate network-on-chip simulator[C]. 2013 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), Austin, USA, 2013: 86–96. doi: 10.1109/ISPASS.2013.6557149.
[30]	BRKIĆ I R and JEFFREY M C M. Disintegrating manycores: Which applications lose and why?[C]. Proceedings of the 16th International Workshop on Network on Chip Architectures, Toronto, Canada, 2023: 3–8. doi: 10.1145/3610396.3618090.
[31]	JEFFREY M C, SUBRAMANIAN S, YAN Cong, et al. A scalable architecture for ordered parallelism[C]. 2015 48th International Symposium on Microarchitecture (MICRO), Waikiki, USA, 2015: 228–241. doi: 10.1145/2830772.2830777.
[32]	ZHI Haocong, XU Xianuo, HAN Weijian, et al. A methodology for simulating multi-Chiplet systems using open-source simulators[C]. Proceedings of the Eight Annual ACM International Conference on Nanoscale Computing and Communication, New York, NY, USA, 2021: 18. doi: 10.1145/3477206.3477459.
[33]	ORENES-VERA M, TURECI E, MARTONOSI M, et al. DCRA: A distributed Chiplet-based reconfigurable architecture for irregular applications[EB/OL]. https://arxiv.org/abs/2311.15443, 2024.
[34]	ORENES-VERA M, TURECI E, MARTONOSI M, et al. MuchiSim: A simulation framework for design exploration of multi-chip Manycore systems[C]. Proceedings of the 2024 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), Indianapolis, USA, 2024: 48–60. doi: 10.1109/ISPASS61541.2024.00015.
[35]	LI Xingyu. High-performance FPGA-accelerated Chiplet modeling[D]. [Master dissertation], University of California, Berkeley, 2022.
[36]	CHIRKOV G and WENTZLAFF D. SMAPPIC: Scalable multi-FPGA architecture prototype platform in the cloud[C]. Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Vancouver, Canada, 2023: 733–746. doi: 10.1145/3575693.3575753.
[37]	ZHAN Xusheng, BAO Yungang, BIENIA C, et al. PARSEC3.0: A multicore benchmark suite with network stacks and SPLASH-2X[J]. ACM SIGARCH Computer Architecture News, 2017, 44(5): 1–16. doi: 10.1145/3053277.3053279.
[38]	HARDAVELLAS N, FERDMAN M, FALSAFI B, et al. Reactive NUCA: Near-optimal block placement and replication in distributed caches[J]. ACM SIGARCH Computer Architecture News, 2009, 37(3): 184–195. doi: 10.1145/1555815.1555779.
[39]	AWASTHI M, SUDAN K, BALASUBRAMONIAN R, et al. Dynamic hardware-assisted software-controlled page placement to manage capacity allocation and sharing within large caches[C]. 2009 IEEE 15th International Symposium on High Performance Computer Architecture, Raleigh, USA, 2009: 250–261. doi: 10.1109/HPCA.2009.4798260.
[40]	KIM C, BURGER D, and KECKLER S W, et al. An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches[C]. Proceedings of the 10th International Conference on Architectural Support for Programming Languages and Operating Systems, San Jose, USA, 2002: 211–222. doi: 10.1145/605397.605420.
[41]	LI Chengeng, JIANG Fan, CHEN Shixi, et al. Accelerating cache coherence in Manycore processor through silicon photonic Chiplet[C]. Proceedings of the 41st IEEE/ACM International Conference on Computer-Aided Design (ICCAD'22), San Diego, USA, 2022: 43. doi: 10.1145/3508352.3549338.
[42]	CUBERO-CASCANTE J, ZURSTRAßEN N, NÖLLER J, et al. Parti-gem5: Gem5’s timing mode parallelised[C]. 23rd International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation, Samos, Greece, 2023: 177–192. doi: 1 0.1007/978-3-031-46077-7_12.