高级搜索

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

面向芯粒互连的单端64 Gb/s全双工收发机设计

王知非 黄之闻 叶天辰 叶秉奕 李芳竹 王玮 于敦山 盖伟新

王知非, 黄之闻, 叶天辰, 叶秉奕, 李芳竹, 王玮, 于敦山, 盖伟新. 面向芯粒互连的单端64 Gb/s全双工收发机设计[J]. 电子与信息学报. doi: 10.11999/JEIT250506
引用本文: 王知非, 黄之闻, 叶天辰, 叶秉奕, 李芳竹, 王玮, 于敦山, 盖伟新. 面向芯粒互连的单端64 Gb/s全双工收发机设计[J]. 电子与信息学报. doi: 10.11999/JEIT250506
WANG Zhifei, HUANG Zhiwen, YE Tianchen, YE Bingyi, LI Fangzhu, WANG Wei, YU Dunshan, GAI Weixin. A 64 Gb/s Single-Ended Simultaneous Bi-Directional Transceiver for Die-to-Die Interfaces[J]. Journal of Electronics & Information Technology. doi: 10.11999/JEIT250506
Citation: WANG Zhifei, HUANG Zhiwen, YE Tianchen, YE Bingyi, LI Fangzhu, WANG Wei, YU Dunshan, GAI Weixin. A 64 Gb/s Single-Ended Simultaneous Bi-Directional Transceiver for Die-to-Die Interfaces[J]. Journal of Electronics & Information Technology. doi: 10.11999/JEIT250506

面向芯粒互连的单端64 Gb/s全双工收发机设计

doi: 10.11999/JEIT250506 cstr: 32379.14.JEIT250506
基金项目: 北京市科技计划(Z221100007722019)
详细信息
    作者简介:

    王知非:男,博士生,研究方向为超高速收发机电路设计等

    黄之闻:男,博士生,研究方向为超高速收发机电路设计等

    叶天辰:男,博士生,研究方向为超高速收发机电路设计等

    叶秉奕:男,青年研究员,研究方向为超高速收发机电路设计等

    李芳竹:女,硕士生,研究方向为超高速收发机电路设计等

    王玮:男,教授,研究方向为微系统集成与热管理技术等

    于敦山:男,教授,研究方向为数字信号处理电路设计等

    盖伟新:男,教授,研究方向为超高速收发机电路设计等

    通讯作者:

    盖伟新 wgai@pku.edu.cn

  • 中图分类号: TN432

A 64 Gb/s Single-Ended Simultaneous Bi-Directional Transceiver for Die-to-Die Interfaces

Funds: Beijing Major Science and Technology Project (Z221100007722019)
  • 摘要: 芯粒集成将多颗不同功能、工艺的芯粒封装在一起,为高性能芯片发展开辟了新的思路。芯粒间互连接口电路作为数据传输的纽带,其带宽密度、误码率和功耗对芯片算力、数据吞吐量等关键性能至关重要。针对带宽密度提升带来的信号反射、串扰等问题,该文提出了一种具备回波、近端串扰、远端串扰消除功能的全双工收发机电路,并基于28 nm工艺进行了流片验证。其利用全双工技术提升了单通道数据速率,利用动态阈值判决技术实现了双向收发信号分离、回波与近端串扰消除,利用信道间容性与感性耦合的平衡实现了远端串扰消除。此外,延时匹配的源同步时钟结构降低了时钟相对数据抖动、提升了噪声容限,驻波与重置信号传输电路实现了发送信号的同步,提高了近端串扰消除精度。测试结果表明,在3 mm长的无屏蔽互连信道上,此收发机可以64 Gb/s的单通道速率、10.5 Tb/s/mm的带宽密度,实现低于10–16的误码率,能效为1.21 pJ/b。
  • 图  1  源同步全双工互连系统结构

    图  2  动态阈值判决技术原理示意图

    图  3  相邻信道间串扰示意图

    图  4  延时匹配的源同步时钟结构

    图  5  单通道收发机电路结构

    图  6  DAC电路结构

    图  7  DAC输出线性度仿真

    图  8  不同线性度DAC仿真得到的眼图

    图  9  驱动器电路结构

    图  10  判决器电路结构

    图  11  判决器灵敏度随输入共模电压变化的曲线

    图  12  片上信道结构与频率响应

    图  13  FIR滤波器系数自适应仿真及系数稳定后的数据波形

    图  14  源同步时钟电路结构

    图  15  时钟-数据延时匹配与PSIJ仿真

    图  16  传统的时钟与分频重置信号传输电路

    图  17  通道间同步时钟传输电路结构

    图  18  驻波时钟传输电路的幅度与相位仿真

    图  19  D-VTH输出与NEXT对齐

    图  20  芯片显微照片与测试环境

    图  21  测得的片上误码率眼图、浴盆曲线

    图  22  仿真得到的功耗分解

    表  1  CIJ消除率计算

    w/o NEXTw/ NEXTCIJCIJ消除率
    w/o NEXTCw/o NEXTCw/ NEXTCw/o NEXTCw/ NEXTC
    抖动峰峰值 (UIpp)0.300.880.360.580.0689.6%
    下载: 导出CSV

    表  2  芯粒互连收发机性能总结与对比

    ISSCC’20[10] JSSC’23[14] JSSC’23[5] JSSC’25[6] ISSCC’25[22] ISSCC’25[23] 本文
    工艺(nm) 65 28 5 3 3 3 28
    数据速率 (Gb/s) 4 10 50.4 32 32 16 64
    信号调制方式 NRZ NRZ NRZ / Bi-dir NRZ NRZ NRZ NRZ / Bi-dir
    信道长度 (mm) 6 50.8 1.2 2 1.7 1.4 3
    信道中心距 (μm) 1 236 6.1
    信道间屏蔽需求 接地屏蔽 接地屏蔽 接地屏蔽 接地屏蔽
    能效 (pJ/b) 1.5 1.29 0.297 0.36 0.6 0.29 1.21
    带宽密度 (Tb/(s·mm)) 8 0.035 2.14 7.68 10.5 5.27 10.5
    数据走线层数 2 1 1 4 1
    测试误码率 <1e-12 <1e-12 <1e-12 <1e-12 <1e-11 <1e-16(a)
    串扰消除方式
    CIJ消除率
    FIR
    78%
    Coding
    45%
    FIR
    89.6%
    回波消除方式 FIR
    (a) 在最优采样点测得
    下载: 导出CSV
  • [1] TAYLOR G, FARJADRAD R, and VINNAKOTA B. High capacity on-package physical link considerations[C]. IEEE Symposium on High-Performance Interconnects (HOTI), Santa Clara, USA, 2019: 19–22. doi: 10.1109/HOTI.2019.00018.
    [2] SEO J, LEE S, LEE M, et al. A 20-Gb/s/pin 0.0024-mm2 single-ended DECS TRX with CDR-less Self-slicing/auto-deserialization to improve tolerance on duty cycle error and RX supply noise for DCC/CDR-less short-reach memory interfaces[C]. IEEE International Solid-State Circuits Conference (ISSCC), San Francisco, USA, 2022: 1–3. doi: 10.1109/ISSCC42614.2022.9731763.
    [3] RIE H N, YOON C S, BYUN J, et al. A 40-Gb/s/pin low-voltage POD single-ended PAM-4 transceiver with timing calibrated reset-less slicer and bidirectional T-coil for GDDR7 application[C]. IEEE Symposium on VLSI Technology and Circuits (VLSI Technology and Circuits), Honolulu, USA, 2022: 148–149. doi: 10.1109/VLSITechnologyandCir46769.2022.9830507.
    [4] SEONG K, PARK D, BAE G, et al. A 4nm 32Gb/s 8Tb/s/mm die-to-die Chiplet using NRZ single-ended transceiver with equalization schemes and training techniques[C]. IEEE International Solid-State Circuits Conference (ISSCC), San Francisco, USA, 2023: 114–116. doi: 10.1109/ISSCC42615.2023.10067477.
    [5] NISHI Y, POULTON J W, TURNER W J, et al. A 0.297-pJ/bit 50.4-Gb/s/wire inverter-based short-reach simultaneous bi-directional transceiver for die-to-die interface in 5-nm CMOS[J]. IEEE Journal of Solid-State Circuits, 2023, 58(4): 1062–1073. doi: 10.1109/JSSC.2022.3232024.
    [6] GU Junhui, MA J, CHOWDHURY A A, et al. A 32 Gb/s 0.36 pJ/bit 3 nm chiplet IO using 2.5-D CoWoS package with real-time and per-lane CDR and bathtub monitoring[J]. IEEE Journal of Solid-State Circuits, 2025, 60(4): 1289–1298. doi: 10.1109/JSSC.2025.3545483.
    [7] PARK H, SONG J, LEE Y, et al. 23.3 A 3-bit/2UI 27Gb/s PAM-3 single-ended transceiver using one-tap DFE for next-generation memory interface[C]. IEEE International Solid-State Circuits Conference, San Francisco, USA, 2019: 382–384. doi: 10.1109/ISSCC.2019.8662462.
    [8] PARK H, SONG J, SIM J, et al. 30-Gb/s 1.11-pJ/bit single-ended PAM-3 transceiver for high-speed memory links[J]. IEEE Journal of Solid-State Circuits, 2021, 56(2): 581–590. doi: 10.1109/JSSC.2020.3006864.
    [9] FAN Yanghang, KUMAR A, IWAI T, et al. A 32-Gb/s simultaneous bidirectional source-synchronous transceiver with adaptive echo cancellation techniques[J]. IEEE Journal of Solid-State Circuits, 2020, 55(2): 439–451. doi: 10.1109/JSSC.2019.2956369.
    [10] KO H G, SHIN S, OH J, et al. 6.7 An 8Gb/s/µm FFE-combined crosstalk-cancellation scheme for HBM on silicon interposer with 3D-staggered channels[C]. IEEE International Solid-State Circuits Conference, San Francisco, USA, 2020: 128–130. doi: 10.1109/ISSCC19947.2020.9063162.
    [11] LEE S K, KIM B, PARK H J, et al. A 5 Gb/s single-ended parallel receiver with adaptive crosstalk-induced jitter cancellation[J]. IEEE Journal of Solid-State Circuits, 2013, 48(9): 2118–2127. doi: 10.1109/JSSC.2013.2264618.
    [12] LEE J, LEE K, SIM J Y, et al. A 246-fJ/b 13.3-Tb/s/mm single-ended current-mode transceiver with crosstalk cancellation for shield-less short-reach interconnect[C]. IEEE Symposium on VLSI Technology and Circuits (VLSI Technology and Circuits), Honolulu, USA, 2024: 1–2. doi: 10.1109/VLSITechnologyandCir46783.2024.10631466.
    [13] ZHONG Liping, WU Hongzhi, ZHANG Yangyi, et al. 7.6 A 112Gb/s/pin single-ended crosstalk-cancellation transceiver with 31dB loss compensation in 28nm CMOS[C]. IEEE International Solid-State Circuits Conference (ISSCC), San Francisco, USA, 2024: 134–136. doi: 10.1109/ISSCC49657.2024.10454508.
    [14] LIU Qian, DU Li, and DU Yuan. A 0.90-Tb/s/in 1.29-pJ/b wireline transceiver with single-ended crosstalk cancellation coding scheme for high-density interconnects[J]. IEEE Journal of Solid-State Circuits, 2023, 58(8): 2326–2336. doi: 10.1109/JSSC.2023.3261125.
    [15] RAZAVI B. The strongARM latch [a circuit for all seasons][J]. IEEE Solid-State Circuits Magazine, 2015, 7(2): 12–17. doi: 10.1109/MSSC.2015.2418155.
    [16] YE Bingyi, SHENG Kai, GAI Weixin, et al. A 2.29-pJ/b 112-Gb/s wireline transceiver with RX four-tap FFE for medium-reach applications in 28-nm CMOS[J]. IEEE Journal of Solid-State Circuits, 2023, 58(1): 19–29. doi: 10.1109/JSSC.2022.3223052.
    [17] BOGATIN E. Signal and Power Integrity[M]. 3rd ed. New York: Pearson Education, 2018: 457: 533.
    [18] ZHUANG Haoyu, CAO Wenzhen, PENG Xizhu, et al. A three-stage comparator and its modified version with fast speed and low kickback[J]. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 2021, 29(7): 1485–1489. doi: 10.1109/TVLSI.2021.3077624.
    [19] FIGUEIREDO P M and VITAL J C. Kickback noise reduction techniques for CMOS latched comparators[J]. IEEE Transactions on Circuits and Systems II: Express Briefs, 2006, 53(7): 541–545. doi: 10.1109/TCSII.2006.875308.
    [20] UCIe Consortium. UCIeTM Specification 1.1 Universal chiplet interconnect expressTM[S]. UCIe, 2022.
    [21] LI Guansheng, LEE W, CUI Delong, et al. Standing wave based clock distribution technique with application to a 10×11 Gbps transceiver in 28 nm CMOS[C]. IEEE Asian Solid-State Circuits Conference (A-SSCC), Xiamen, China, 2015: 1–4. doi: 10.1109/ASSCC.2015.7387451.
    [22] LIN Mushan, TSAI C C, LI Shenggao, et al. 36.1 A 32Gb/s 10.5Tb/s/mm 0.6pJ/b UCIe-compliant low-latency interface in 3nm featuring matched-delay for dynamic clock gating[C]. IEEE International Solid-State Circuits Conference (ISSCC), San Francisco, USA, 2025: 586–588. doi: 10.1109/ISSCC49661.2025.10904767.
    [23] MELEK D T, NAVINKUMAR R, VANDERSAND J, et al. A 0.29pJ/b 5.27Tb/s/mm UCIe advanced package link in 3nm FinFET with 2.5D CoWoS packaging[C]. IEEE International Solid-State Circuits Conference (ISSCC), San Francisco, USA, 2025: 590–592. doi: 10.1109/ISSCC49661.2025.10904754.
  • 加载中
图(22) / 表(2)
计量
  • 文章访问数:  180
  • HTML全文浏览量:  92
  • PDF下载量:  24
  • 被引次数: 0
出版历程
  • 收稿日期:  2025-06-03
  • 修回日期:  2025-07-24
  • 网络出版日期:  2025-08-06

目录

    /

    返回文章
    返回