Advanced Search
Turn off MathJax
Article Contents
ZHANG Congwu, LIU Ao, ZHANG Ke, CHANG Yisong, BAO Yungang. A System-level Exploration and Evaluation Simulator for chiplet-based CPU[J]. Journal of Electronics & Information Technology. doi: 10.11999/JEIT240299
Citation: ZHANG Congwu, LIU Ao, ZHANG Ke, CHANG Yisong, BAO Yungang. A System-level Exploration and Evaluation Simulator for chiplet-based CPU[J]. Journal of Electronics & Information Technology. doi: 10.11999/JEIT240299

A System-level Exploration and Evaluation Simulator for chiplet-based CPU

doi: 10.11999/JEIT240299
Funds:  The Strategic Priority Research Program of Chinese Academy of Sciences (XDA0320000, XDA0320300), The MajorProgram of the National Natural Science Foundation of China (62090020)
  • Received Date: 2024-04-19
  • Rev Recd Date: 2024-11-11
  • Available Online: 2024-11-19
  • As Moore’s Law comes to an end, it is more and more difficult to improve the chip manufacturing process, and chiplet technology has been widely adopted to improve the chip performance. However, new design parameters introduced into the chiplet architecture pose significant challenges to the computer architecture simulator. To fully support exploration and evaluation of chiplet architecture, SEEChiplet (System-level Exploration and Evaluation simulator for chiplet), a framework based on gem5 simulator, is developed in this paper. Firstly, three design parameters concerned about chiplet chip design are summarized in this paper, including: (1) chiplet cache system design; (2) Packaging simulation; (3) Interconnection networks between chiplet. Secondly, in view of the above three design parameters, in this paper: (1) a new private last level cache system are designed and implemented to expand the cache system design space; (2) existing gem5 global directory is modified to adapt to new private Last Level cache (LLC) system; (3) two common packaging methods of chiplet and inter-chiplet network are modeled. Finally, a chiplet-based processor is simulated with PARSEC 3.0 benchmark program running on it, which proves that SEEChiplet can explore and evaluate the design space of chiplet.
  • loading
  • [1]
    MOORE G E. Cramming more components onto integrated circuits[J]. Electronics, 1965, 38(8): 114–117.
    [2]
    DENNARD R H, GAENSSLE F H, YU H N, et al. Design of ion-implanted MOSFET's with very small physical dimensions[J]. IEEE Journal of Solid-State Circuits, 1974, 9(5): 256–268. doi: 10.1109/JSSC.1974.1050511.
    [3]
    HAN Yinhe, XU Haobo, LU Meixuan, et al. The big chip: Challenge, model and architecture[J]. Fundamental Research, 2023, S2667325823003709. doi: 10.1016/j.fmre.2023.10.020. (查阅网上资料,未找到本条文献卷期信息,请补充) .
    [4]
    CAI Jingwei, WU Zuotong, PENG Sen, et al. Gemini: Mapping and architecture co-exploration for large-scale DNN Chiplet accelerators[C]. 2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA), Edinburgh, United Kingdom, 2024: 156–171. doi: 10.1109/HPCA57654.2024.00022.
    [5]
    陈云霁, 蔡一茂, 汪玉, 等. 集成电路未来发展与关键问题—第347期"双清论坛(青年)"学术综述[J]. 中国科学: 信息科学, 2024, 54(1): 1–15. doi: 10.1360/SSI-2023-0356.

    CHEN Yunji, CAI Yimao, WANG Yu, et al. Integrated circuit technology: Future development and key issues–review of the 347th "Shuangqing Forum (Youth)"[J]. Scientia Sinica Informationis, 2024, 54(1): 1–15. doi: 10.1360/SSI-2023-0356.
    [6]
    项少林, 郭茂, 蒲菠, 等. Chiplet技术发展现状[J]. 科技导报, 2023, 41(19): 113–131. doi: 10.3981/j.issn.1000-7857.2023.19.013.

    XIANG Shaolin, GUO Mao, PU Bo, et al. Overview of the development status of Chiplet technology[J]. Science & Technology Review, 2023, 41(19): 113–131. doi: 10.3981/j.issn.1000-7857.2023.19.013.
    [7]
    厉佳瑶, 张琨, 潘权. Chiplet技术: 拓展芯片设计的新边界[J]. 集成电路与嵌入式系统, 2024, 24(2): 1–9.

    LI Jiayao, ZHANG Kun, and PAN Quan. Chiplet: Expanding the innovative boundaries of chip design[J]. Integrated Circuits and Embedded Systems, 2024, 24(2): 1–9.
    [8]
    MA Xiaohan, WANG Ying, WANG Yujie, et al. Survey on Chiplets: Interface, interconnect and integration methodology[J]. CCF Transactions on High Performance Computing, 2022, 4(1): 43–52. doi: 10.1007/s42514-022-00093-0.
    [9]
    SUGGS D, SUBRAMONY M, and BOUVIER D. The AMD “Zen 2” processor[J]. IEEE Micro, 2020, 40(2): 45–52. doi: 10.1109/MM.2020.2974217.
    [10]
    NAFFZIGER S, LEPAK K, PARASCHOU M, et al. 2.2 AMD Chiplet architecture for high-performance server and desktop products[C]. 2020 IEEE International Solid-State Circuits Conference - (ISSCC), San Francisco, USA, 2020: 44–45. doi: 10.1109/ISSCC19947.2020.9063103.
    [11]
    EVERS M, BARNES L, and CLARK M. The AMD next-generation “Zen 3” Core[J]. IEEE Micro, 2022, 42(3): 7–12. doi: 10.1109/MM.2022.3152788.
    [12]
    MUNGER B, WILCOX K, SNIDERMAN J, et al. Zen 4: The AMD 5nm 5.7GHz x86-64 microprocessor core[C]. 2023 IEEE International Solid-State Circuits Conference (ISSCC), San Francisco, USA, 2023: 38–39. doi: 10.1109/ISSCC42615.2023.10067540.
    [13]
    GIANOS C. Architecting for flexibility and value with next gen Intel® Xeon® processors[C]. 2023 IEEE Hot Chips 35 Symposium (HCS), Palo Alto, USA, 2023: 1–15. doi: 10.1109/HCS59251.2023.10254694.
    [14]
    ESPOSITO B. Intel Agilex® 9 direct RF-series FPGAs with integrated 64 Gsps data converters[C]. 2023 IEEE Hot Chips 35 Symposium (HCS), Palo Alto, USA, 2023: 1–35. doi: 10.1109/HCS59251.2023.10254707.
    [15]
    VENTANA MICRO. Veyron V1 data center-class RISC-V processor[C]. 2023 IEEE Hot Chips 35 Symposium (HCS), Palo Alto, USA, 2023: 1–16. doi: 10.1109/HCS59251.2023.10254710. (查阅网上资料,未找到本条文献作者名信息,请确认) .
    [16]
    CHIRKOV G and WENTZLAFF D. Seizing the bandwidth scaling of on-package interconnect in a post-Moore's law world[C]. Proceedings of the 37th International Conference on Supercomputing, Orlando, USA, 2023: 410–422. doi: 10.1145/3577193.3593702.
    [17]
    YANG Chongyi, ZHANG Zhendong, WANG Xiaohang, et al. Adaptive caching policies for Chiplet systems based on reinforcement learning[C]. 2023 IEEE International Symposium on Circuits and Systems (ISCAS), Monterey, USA, 2023: 1–5. doi: 10.1109/ISCAS46773.2023.10181966.
    [18]
    GADE S H, SINHA M, KUMAR M, et al. Scalable hybrid cache coherence using emerging links for Chiplet architectures[C]. 2022 35th International Conference on VLSI Design and 2022 21st International Conference on Embedded Systems (VLSID), Bangalore, India, 2022: 92–97. doi: 10.1109/VLSID2022.2022.00029.
    [19]
    MEDINA R, KEIN J, ANSALONI G, et al. System-level exploration of in-package wireless communication for multi-Chiplet platforms[C]. Proceedings of the 28th Asia and South Pacific Design Automation Conference, Tokyo, Japan, 2023: 561–566. doi: 10.1145/3566097.3567952.
    [20]
    ZHU Mingcan, SHAHAB A, KATSARAKIS A, et al. Invalidate or update? Revisiting coherence for tomorrow's cache hierarchies[C]. 2021 30th International Conference on Parallel Architectures and Compilation Techniques (PACT), Atlanta, USA, 2021: 226–241. doi: 10.1109/PACT52795.2021.00024.
    [21]
    SHAHAB A, ZHU Mingcan, MARGARITOV A, et al. Farewell my shared LLC! A case for private die-stacked DRAM caches for servers[C]. 2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), Fukuoka, Japan, 2018: 559–572. doi: 10.1109/MICRO.2018.00052.
    [22]
    BHARADWAJ S, YIN Jieming, BECKMANN B, et al. Kite: A family of heterogeneous interposer topologies enabled via accurate interconnect modeling[C]. 2020 57th ACM/IEEE Design Automation Conference (DAC), San Francisco, USA, 2020: 1–6. doi: 10.1109/DAC18072.2020.9218539.
    [23]
    IFF P, BESTA M, CAVALCANTE M, et al. HexaMesh: Scaling to hundreds of Chiplets with an optimized Chiplet arrangement[C]. 2023 60th ACM/IEEE Design Automation Conference (DAC), San Francisco, USA, 2023: 1–6. doi: 10.1109/DAC56929.2023.10248006.
    [24]
    FU Yaosheng and WENTZLAFF D. PriME: A parallel and distributed simulator for thousand-core chips[C]. 2014 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), Monterey, USA, 2014: 116–125. doi: 10.1109/ISPASS.2014.6844467.
    [25]
    LOWE-POWER J, AHMAD A M, AKRAM A, et al. The gem5 simulator: Version 20.0+[EB/OL]. https://arxiv.org/abs/2007.03152, 2020.
    [26]
    UBAL R, JANG B, MISTRY P, et al. Multi2Sim: A simulation framework for CPU-GPU computing[C]. Proceedings of the 21st International Conference on Parallel Architectures and Compilation Techniques. Minneapolis, USA, 2012: 335–344. doi: 10.1145/2370816.2370865.
    [27]
    QURESHI Y M, SIMON W A, ZAPATER M, et al. gem5-X: A many-core heterogeneous simulation platform for architectural exploration and optimization[J]. ACM Transactions on Architecture and Code Optimization (TACO), 2021, 18(4): 44. doi: 10.1145/3461662.
    [28]
    HARDAVELLAS N, SOMOGYI S, WENISCH T F, et al. SimFlex: A fast, accurate, flexible full-system simulation framework for performance evaluation of server architecture[J]. ACM SIGMETRICS Performance Evaluation Review, 2004, 31(4): 31–34. doi: 10.1145/1054907.1054914.
    [29]
    JIANG Nan, BECKER U D, MICHELOGIANNAKIS G, et al. A detailed and flexible cycle-accurate network-on-chip simulator[C]. 2013 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), Austin, USA, 2013: 86–96. doi: 10.1109/ISPASS.2013.6557149.
    [30]
    BRKIĆ I R and JEFFREY M C M. Disintegrating manycores: Which applications lose and why?[C]. Proceedings of the 16th International Workshop on Network on Chip Architectures, Toronto, Canada, 2023: 3–8. doi: 10.1145/3610396.3618090.
    [31]
    JEFFREY M C, SUBRAMANIAN S, YAN Cong, et al. A scalable architecture for ordered parallelism[C]. 2015 48th International Symposium on Microarchitecture (MICRO), Waikiki, USA, 2015: 228–241. doi: 10.1145/2830772.2830777.
    [32]
    ZHI Haocong, XU Xianuo, HAN Weijian, et al. A methodology for simulating multi-Chiplet systems using open-source simulators[C]. Proceedings of the Eight Annual ACM International Conference on Nanoscale Computing and Communication, New York, NY, USA, 2021: 18. doi: 10.1145/3477206.3477459. (查阅网上资料,未找到本条文献出版地信息,请确认) .
    [33]
    ORENES-VERA M, TURECI E, MARTONOSI M, et al. DCRA: A distributed Chiplet-based reconfigurable architecture for irregular applications[EB/OL]. https://arxiv.org/abs/2311.15443, 2024.
    [34]
    ORENES-VERA M, TURECI E, MARTONOSI M, et al. MuchiSim: A simulation framework for design exploration of multi-chip Manycore systems[C]. Proceedings of the 2024 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), Indianapolis, USA, 2024: 48–60. doi: 10.1109/ISPASS61541.2024.00015.
    [35]
    LI Xingyu. High-performance FPGA-accelerated Chiplet modeling[D]. [Master dissertation], University of California, Berkeley, 2022.
    [36]
    CHIRKOV G and WENTZLAFF D. SMAPPIC: Scalable multi-FPGA architecture prototype platform in the cloud[C]. Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Vancouver, Canada, 2023: 733–746. doi: 10.1145/3575693.3575753.
    [37]
    ZHAN Xusheng, BAO Yungang, BIENIA C, et al. PARSEC3.0: A multicore benchmark suite with network stacks and SPLASH-2X[J]. ACM SIGARCH Computer Architecture News, 2017, 44(5): 1–16. doi: 10.1145/3053277.3053279.
    [38]
    HARDAVELLAS N, FERDMAN M, FALSAFI B, et al. Reactive NUCA: Near-optimal block placement and replication in distributed caches[J]. ACM SIGARCH Computer Architecture News, 2009, 37(3): 184–195. doi: 10.1145/1555815.1555779.
    [39]
    AWASTHI M, SUDAN K, BALASUBRAMONIAN R, et al. Dynamic hardware-assisted software-controlled page placement to manage capacity allocation and sharing within large caches[C]. 2009 IEEE 15th International Symposium on High Performance Computer Architecture, Raleigh, USA, 2009: 250–261. doi: 10.1109/HPCA.2009.4798260.
    [40]
    KIM C, BURGER D, and KECKLER S W, et al. An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches[C]. Proceedings of the 10th International Conference on Architectural Support for Programming Languages and Operating Systems, San Jose, USA, 2002: 211–222. doi: 10.1145/605397.605420.
    [41]
    LI Chengeng, JIANG Fan, CHEN Shixi, et al. Accelerating cache coherence in Manycore processor through silicon photonic Chiplet[C]. Proceedings of the 41st IEEE/ACM International Conference on Computer-Aided Design (ICCAD'22), San Diego, USA, 2022: 43. doi: 10.1145/3508352.3549338.
    [42]
    CUBERO-CASCANTE J, ZURSTRAßEN N, NÖLLER J, et al. Parti-gem5: Gem5’s timing mode parallelised[C]. 23rd International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation, Samos, Greece, 2023: 177–192. doi: 1 0.1007/978-3-031-46077-7_12.
  • 加载中

Catalog

    通讯作者: 陈斌, bchen63@163.com
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Figures(12)  / Tables(6)

    Article Metrics

    Article views (7) PDF downloads(0) Cited by()
    Proportional views
    Related

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return