通过主动加速恢复延长芯片寿命：机遇与挑战

郭鑫斐

doi:10.11999/JEIT230323

通过主动加速恢复延长芯片寿命：机遇与挑战

doi: 10.11999/JEIT230323 cstr: 32379.14.JEIT230323

郭鑫斐^,

上海交通大学密西根学院上海 200240

基金项目: 国家自然科学基金(62201340)

详细信息

作者简介:
郭鑫斐：男，助理教授，研究方向为集成电路可靠性、电子设计自动化、低功耗人工智能芯片

通讯作者:
郭鑫斐　xinfei.guo@sjtu.edu.cn

中图分类号: TN406
计量
- 文章访问数: 1596
- HTML全文浏览量: 1426
- PDF下载量: 229
- 被引次数: 0
出版历程
- 收稿日期: 2023-04-26
- 修回日期: 2023-07-25
- 网络出版日期: 2023-08-04
- 刊出日期: 2023-09-27

Active Accelerated Recovery for Extended Chip Lifetime: Opportunities and Challenges

GUO Xinfei^,

University of Michigan – Shanghai Jiao Tong University Joint Institute, Shanghai Jiao Tong University, Shanghai 200240, China

Funds: The National Natural Science Foundation of China (62201340)

摘要

摘要: 新型工艺下芯片集成度的提高和尺寸的缩小导致了器件内部电场和电流密度的不断增加，使得老化问题日趋严重，当前针对老化主要的防护思路依然是采取保护带和预留时序裕量的方式，但该方法会导致过度设计。近年来，多项研究工作从实验的角度证明了芯片的主要老化机制具有一定的可恢复性，恢复过程且可被加速，从而大幅降低设计初期的时序裕量，由此启发了主动加速恢复的设计思路。该文回顾了老化防护的已有设计方法和主动加速恢复的相关进展，分析了主动加速恢复的潜在优势，并讨论了从模型、电路设计以及系统设计等角度进行片上实现所面临的瓶颈问题和相应的解决方法，提出了以感知-主动加速恢复为核心的自适应老化防护设计概念。
- 主动加速恢复 /
- 可靠性 /
- 老化 /
- 温度偏置不稳定性 /
- 电迁移
Abstract: The higher level of integration and smaller feature size in advanced technology nodes have led to increased electrical field and current density, which worsen further the chip aging issues. Current design solutions against aging are still based on guardband or extra timing margins, which can lead to overdesign. In recent years, multiple research work have demonstrated with experiments that several dominating aging effects can be recovered, and this recovery can be further accelerated. The necessary timing margin can be significantly lowered, inspiring the active accelerated recovery design concept. In this paper, current design solutions against aging and the progress that has been made in active accelerated recovery are reviewed. Potential opportunities introduced by this new method have been identified. Based on aspects of modeling, circuit design and system design, design challenges and their potential solutions to enable on-chip implementations are investigated. A concept of adaptive system that is based on sensing and active accelerated recovery has been proposed.
- Active accelerated recovery /
- Reliability /
- Aging /
- Bias Temperature Instability (BTI) /
- Electromigration

HTML全文

图 1 芯片使用场景的多样化和使用率的增加对于寿命的要求

下载: 全尺寸图片幻灯片

图 2 集成电路老化现象以及其影响示意图(以反相器电路为例)

下载: 全尺寸图片幻灯片

图 3 BTI和EM老化机理示意图

下载: 全尺寸图片幻灯片

图 4 当前常见老化防护方法总结

下载: 全尺寸图片幻灯片

图 5 针对BTI老化和EM老化的主动加速恢复的定义以及与其他状态的对比

下载: 全尺寸图片幻灯片

图 6 实验测得的电迁移(EM)老化过程以及其在高温环境中的主动加速恢复过程

下载: 全尺寸图片幻灯片

图 7 不同老化防护设计思路对于全寿命周期时序余量开销的直观影响

下载: 全尺寸图片幻灯片

图 8 周期性主动加速恢复与被动恢复的对比概念图

下载: 全尺寸图片幻灯片

图 9 传统老化防护方法与主动加速恢复设计方法设计空间的区别

下载: 全尺寸图片幻灯片

图 10 供电网络电迁移分析流程(以V_dd为例，此处忽略V_SS)

下载: 全尺寸图片幻灯片

图 11 可用作主动加速恢复的负偏置电压发生器电路原理图(基于文献[16]进行了修改)

下载: 全尺寸图片幻灯片

图 12 支持供电网络逆向电流的门控电路及其性能评估(基于文献[39]进行了修改)

下载: 全尺寸图片幻灯片

图 13 主动加速恢复情形下的电路设计空间探索模型示意图

下载: 全尺寸图片幻灯片

图 14 感知-主动加速恢复自适应系统集成方案

下载: 全尺寸图片幻灯片

参考文献(41)

[1]	HILL I, CHANAWALA P, SINGH R, et al. CMOS reliability from past to future: A survey of requirements, trends, and prediction methods[J]. IEEE Transactions on Device and Materials Reliability, 2022, 22(1): 1–18. doi: 10.1109/TDMR.2021.3131345
[2]	GUO Xinfei, VERMA V, GONZALEZ-GUERRERO P, et al. When “things” get older: Exploring circuit aging in IoT applications[C]. 2018 19th International Symposium on Quality Electronic Design (ISQED), Santa Clara, USA, 2018: 296–301.
[3]	TAN S, TAHOORI M, KIM T, et al. Long-Term Reliability of Nanometer VLSI Systems[M]. Cham: Springer, 2019.
[4]	CAO Yu, VELAMALA J, SUTARIA K, et al. Cross-layer modeling and simulation of circuit reliability[J]. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2013, 33(1): 8–23. doi: 10.1109/TCAD.2013.2289874
[5]	WANG Runsheng, ZHANG Zuodong, SUN Zixuan, et al. Cross-layer design for reliability in advanced technology nodes: An EDA perspective[C]. 2022 IEEE 16th International Conference on Solid-State & Integrated Circuit Technology (ICSICT), Nangjing, China, 2022: 1–4.
[6]	LIU Changze, REN Pengpeng, SUN Yongsheng, et al. Reliability challenges in advanced technology node: From transistor to circuit (invited)[C]. 2020 IEEE 15th International Conference on Solid-State & Integrated Circuit Technology (ICSICT), Kunming, China, 2020: 1–4.
[7]	KUMAR P, KOLEY K, ASKARI S S A, et al. Assessment of negative bias temperature instability due to interface and oxide trapped charges in gate-all-around TFET devices[J]. IEEE Transactions on Nanotechnology, 2023, 22: 157–165. doi: 10.1109/TNANO.2023.3255012
[8]	BRAVAIX A, GUÉRIN C, HUARD V, et al. Hot-carrier acceleration factors for low power management in DC-AC stressed 40nm NMOS node at high temperature[C]. 2009 IEEE International Reliability Physics Symposium, Montreal, Canada, 2009: 531–548.
[9]	GARBA-SEYBOU T, FEDERSPIEL X, BRAVAIX A, et al. New modelling off-state TDDB for 130nm to 28nm CMOS nodes[C]. 2022 IEEE International Reliability Physics Symposium (IRPS), Dallas, USA, 2022: 11A. 3-1–1-11A. 3-7.
[10]	ROTHE S and LIENIG J. Combined modeling of electromigration, thermal and stress migration in AC interconnect lines[C/OL]. The 2023 International Symposium on Physical Design, 2023: 107–114.
[11]	YU Liting, REN Jianguo, LU Xian, et al. NBTI and HCI aging prediction and reliability screening during production test[J]. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2020, 39(10): 3000–3011. doi: 10.1109/TCAD.2019.2961329
[12]	BERNSTEIN K, FRANK D J, GATTIKER A E, et al. High-performance CMOS variability in the 65-nm regime and beyond[J]. IBM Journal of Research and Development, 2006, 50(4/5): 433–449. doi: 10.1147/rd.504.0433
[13]	JI Zhigang, CHEN Haibao, and LI Xiuyan. Design for reliability with the advanced integrated circuit (IC) technology: Challenges and opportunities[J]. Science China Information Sciences, 2019, 62(12): 226401. doi: 10.1007/s11432-019-2643-5
[14]	LEE K D. Electromigration recovery and short lead effect under bipolar-and unipolar-pulse current[C]. 2012 IEEE International Reliability Physics Symposium (IRPS), Anaheim, USA, 2012: 6B. 3.1–6B. 3.4.
[15]	MAHAPATRA S. Fundamentals of Bias Temperature Instability in MOS Transistors[M]. New Delhi: Springer, 2016.
[16]	GUO Xinfei and STAN M R. Circadian Rhythms for Future Resilient Electronic Systems[M]. Cham: Springer, 2020.
[17]	EVMORFOPOULOS N, SHOHEL M A A, AXELOU O, et al. Recent progress in the analysis of electromigration and stress migration in large multisegment interconnects[C/OL]. The 2023 International Symposium on Physical Design, 2023: 115–123.
[18]	朱炯, 易茂祥, 张姚, 等. 缓解电路NBTI效应的改进门替换技术[J]. 电子测量与仪器学报, 2016, 30(7): 1029–1036. doi: 10.13382/j.jemi.2016.07.007 ZHU Jiong, YI Maoxiang, ZHANG Yao, et al. Improved gate replacement technique for mitigating circuit NBTI effect[J]. Journal of Electronic Measurement and Instrumentation, 2016, 30(7): 1029–1036. doi: 10.13382/j.jemi.2016.07.007
[19]	ZHANG Xinfa, ZHANG Zuodong, LIN Yibo, et al. Efficient aging-aware standard cell library characterization based on sensitivity analysis[J]. IEEE Transactions on Circuits and Systems II:Express Briefs, 2023, 70(2): 721–725. doi: 10.1109/TCSII.2022.3212123
[20]	ZHANG Zuodong, WANG Runsheng, ZHANG Zhe, et al. Reliability-enhanced circuit design flow based on approximate logic synthesis[C]. The 2020 on Great Lakes Symposium on VLSI, Beijing, China, 2020: 71–76.
[21]	CHEN Xiaodao, LIAO Chen, WEI Tongquan, et al. An interconnect reliability-driven routing technique for electromigration failure avoidance[J]. IEEE Transactions on Dependable and Secure Computing, 2012, 9(5): 770–776. doi: 10.1109/TDSC.2010.57
[22]	GUO Xinfei and STAN M R. MCPENS: Multiple-critical-path embeddable NBTI sensors for dynamic wearout management[C]. The 11th IEEE Workshop on Silicon Errors in Logic System Effects, Austin, USA, 2015: 116–121.
[23]	PARK G, YU Hanzhao, KIM M, et al. An all BTI (N-PBTI, N-NBTI, P-PBTI, P-NBTI) odometer based on a dual power rail ring oscillator array[C]. 2021 IEEE International Reliability Physics Symposium (IRPS), Monterey, USA, 2021: 1–5.
[24]	ANGHEL L and CACHO F. Design-time exploration for process, environment and aging compensation techniques for low power reliable-aware design[J]. IEEE Transactions on Emerging Topics in Computing, 2022, 10(2): 581–590. doi: 10.1109/TETC.2021.3136288
[25]	MINTARNO E, SKAF J, ZHENG Rui, et al. Self-tuning for maximized lifetime energy-efficiency in the presence of circuit aging[J]. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2011, 30(5): 760–773. doi: 10.1109/TCAD.2010.2100531
[26]	CHEN Yuguang, LIN I C, and WEI Yongche. A novel NBTI-aware chip remaining lifetime prediction framework using machine learning[C]. 2021 22nd International Symposium on Quality Electronic Design (ISQED), Santa Clara, USA, 2021: 476–481.
[27]	BU Ranran, REN Zhipeng, GE Hao, et al. Online NBTI-induced partially depleted (PD) SOI degradation and recovery prediction utilizing long short-term memory (LSTM)[J]. Microelectronics Reliability, 2023, 142: 114932. doi: 10.1016/j.microrel.2023.114932
[28]	LI Zeyu, HUANG Zhao, WANG Quan, et al. AMROFloor: An efficient aging mitigation and resource optimization floorplanner for virtual coarse-grained runtime reconfigurable FPGAs[J]. Electronics, 2022, 11(2): 273. doi: 10.3390/electronics11020273
[29]	SUN Peng, YANG Zhiming, YU Yang, et al. NBTI and power reduction using an input vector control and supply voltage assignment method[J]. Algorithms, 2017, 10(3): 94. doi: 10.3390/a10030094
[30]	WANG Shengcheng, SUN Zeyu, CHENG Yuan, et al. Leveraging recovery effect to reduce electromigration degradation in power/ground TSV[C]. 2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD), Irvine, USA, 2017: 811–818.
[31]	GUO Xinfei, BURLESON W, and STAN M. Modeling and experimental demonstration of accelerated self-healing techniques[C]. 2014 51st ACM/EDAC/IEEE Design Automation Conference (DAC), San Francisco, USA, 2014: 1–6.
[32]	VELAMALA J B, SUTARIA K, SATO T, et al. Physics matters: Statistical aging prediction under trapping/detrapping[C]. The 49th Annual Design Automation Conference, San Francisco, USA, 2012: 139–144.
[33]	STATHIS J H, MAHAPATRA S, and GRASSER T. Controversial issues in negative bias temperature instability[J]. Microelectronics Reliability, 2018, 81: 244–251. doi: 10.1016/j.microrel.2017.12.035
[34]	PALUMBO F, KLEBANOV M, MONREAL G, et al. Physical origin of the permanent components of the positive charge buildup resulting from NBTI/PBTI stress in nMOS/pMOS transistors[C]. 2022 IEEE International Symposium on the Physical and Failure Analysis of Integrated Circuits (IPFA), Singapore, 2022: 1–5.
[35]	TAO Jiang, CHEN J F, CHEUNG N W, et al. Modeling and characterization of electromigration failures under bidirectional current stress[J]. IEEE Transactions on Electron Devices, 1996, 43(5): 800–808. doi: 10.1109/16.491258
[36]	RINGLER I J and LLOYD J R. Stress relaxation in pulsed DC electromigration measurements[J]. AIP Advances, 2016, 6(9): 095118. doi: 10.1063/1.4963669
[37]	GUO Xinfei and STAN M R. Work hard, sleep well-avoid irreversible IC wearout with proactive rejuvenation[C]. 2016 21st Asia and South Pacific Design Automation Conference (ASP-DAC), Macao, China, 2016: 649–654.
[38]	KAVOUSI M, CHEN Liang, and TAN S X D. Fast electromigration stress analysis considering spatial joule heating effects[C]. 2022 27th Asia and South Pacific Design Automation Conference (ASP-DAC), Taipei, China, 2022: 208–213.
[39]	GUO Xinfei and STAN M R. Deep healing: Ease the BTI and EM wearout crisis by activating recovery[C]. 2017 47th Annual IEEE/IFIP International Conference on Dependable Systems and Networks Workshops (DSN-W), Denver, USA, 2017: 184–191.
[40]	GUO Xinfei and STAN M R. Implications of accelerated self-healing as a key design knob for cross-layer resilience[J]. Integration, 2017, 56: 167–180. doi: 10.1016/j.vlsi.2016.10.008
[41]	TING L M, MAY J S, HUNTER W R, et al. AC electromigration characterization and modeling of multilayered interconnects[C]. 31st Annual Proceedings Reliability Physics 1993, Atlanta, USA, 1993: 311–316.