Key Technology and Development of Triple Modular Redundancy Tool for FPGA
-
摘要: SRAM型现场可编程门阵列(FPGA)在空间辐射环境中容易受到单粒子效应的影响,从而发生软错误,三模冗余技术(TMR)是目前使用最广泛的缓解FPGA软错误的电路加固技术。该文首先介绍了三模冗余技术研究现状,然后总结了三模冗余工具常用的细粒度TMR技术、系统分级技术、配置刷新技术、状态同步技术4项关键技术及其实现原理。随着FPGA的高层次综合技术愈发成熟,基于高层次综合的三模冗余工具逐渐成为新的研究分支,该文分类介绍了当前主流的基于寄存器传输级的三模冗余工具,基于重要软核资源的三模冗余工具,以及新兴的基于高层次综合的三模冗余工具,最后对FPGA三模冗余工具的未来发展趋势进行了总结与展望。Abstract: SRAM-based FPGAs are sensitive to single event effect in space radiation environment, resulting in soft errors. Triple Modular Redundancy (TMR) is the most widely used circuit hardening technology to alleviate FPGA soft errors. This paper introduces first the current research status of TMR technology, and then summarizes four key technologies and their implementation principles of fine-grained TMR technology, system partitioning technology, configuration scrubbing technology and state synchronization technology, which are commonly used in TMR tools. As the high-level synthesis technology of FPGA becomes more and more mature, the TMR tools based on high level synthesis have gradually become a new research branch. The current mainstream TMR tools based on the register transfer level, TMR tools based on important soft-core resources, and the emerging TMR tools based on high-level synthesis are classified and introduced. Finally, the future development trend of TMR tool for FPGA is summarized and forecasted.
-
表 1 现有的TMR工具
分类 特点 工具 特点 基于RTL 可以实现对TMR实现细节的微调,
面临综合阶段冗余被优化的问题,
需要掌握综合阶段的各种中间网表文件的细节RASP-TMR Verilog语言的TMR,基于MATLAB开发,功能简单 TMRG Verilog语言的TMR,使用Python编写,维护积极,适合学术交流 Xilinx TMRTool RTL级.ngc网表文件的TMR,受国际武器贸易条例保护 BL-TMR RTL级.edif网表文件的TMR,开源版本早已停止更新 Mentor Precision Hi-Rel RTL综合阶段TMR,采用细粒度TMR技术,基于汉明编码的安全状态机策略 Synopsys Synplify
PremierRTL综合阶段TMR,与Mentor的工具类似,网上可查阅的资料少 基于HLS 大幅缩短设计周期,提供流水线设计,减轻
TMR设计带来的负面时序影响,
对设计进行HLS空间探索TLegUp HLS阶段的TMR,构建该方向的大框架,受商业化的限制,更新停滞 C-TMR C语言的TMR,可对设计进行HLS空间探索,还未形成完成工具 基于软核 对软核提供了功能完备的保护,但仅针对MicroBlaze提供TMR优化,使用范围单一局限 Xilinx Vivado MicroBlaze TMR 软核的TMR,5个IP组成的TMR子系统,自动管理和屏蔽影响MicroBlaze软核的故障 -
[1] VON NEUMANN J. Probabilistic Logics and the Synthesis of Reliable Organisms from Unreliable Components[M]. SHANNON C E and MCCARTHY J. Automata Studies. Princeton: Princeton University Press, 1956: 43–98. [2] LYONS R E and VANDERKULK W. The use of triple-modular redundancy to improve computer reliability[J]. IBM Journal of Research and Development, 1962, 6(2): 200–209. doi: 10.1147/rd.62.0200 [3] 黄影, 张春元, 刘东. SRAM型FPGA的抗SEU方法研究[J]. 中国空间科学技术, 2007(4): 57–65. doi: 10.3321/j.issn:1000-758X.2007.04.010HUANG Ying, ZHANG Chunyuan, and LIU Dong. Research on SEU mitigation of FPGA based-on SRAM[J]. Chinese Space Science and Technology, 2007(4): 57–65. doi: 10.3321/j.issn:1000-758X.2007.04.010 [4] PRATT B, CAFFREY M, GRAHAM P, et al. Improving FPGA design robustness with partial TMR[C]. 2006 IEEE International Reliability Physics Symposium Proceedings, San Jose, USA, 2006: 226–232. [5] SAMUDRALA P K, RAMOS J, and KATKOORI S. Selective triple modular redundancy (STMR) based single-event upset (SEU) tolerant synthesis for FPGAs[J]. IEEE Transactions on Nuclear Science, 2004, 51(5): 2957–2969. doi: 10.1109/TNS.2004.834955 [6] GOMES I A C, MARTINS M, REIS A, et al. Using only redundant modules with approximate logic to reduce drastically area overhead in TMR[C]. 2015 16th Latin-American Test Symposium (LATS), Puerto Vallarta, Mexico, 2015: 1–6. [7] SHASHIDHARA B, JADHAV S, and KIM Y S. Reconfigurable fault tolerant processor on a SRAM based FPGA[C]. 2020 IEEE International Conference on Electro Information Technology (EIT), Chicago, USA, 2020: 151–154. [8] 段小虎, 马小博, 程俊强. SRAM工艺FPGA三模冗余设计故障管理与恢复[J]. 信息通信, 2020(3): 139–141,143. doi: 10.3969/j.issn.1673-1131.2020.03.059DUAN Xiaohu, MA Xiaobo, and CHENG Junqiang. Fault management and recovery of triple modular redundancy design for SRAM-based FPGA[J]. Information &Communications, 2020(3): 139–141,143. doi: 10.3969/j.issn.1673-1131.2020.03.059 [9] 徐伟杰, 谢永乐, 彭礼彪, 等. 基于SRAM型FPGA的实时容错自修复系统设计方法[J]. 电子技术应用, 2019, 45(7): 50–55. doi: 10.16157/j.issn.0258-7998.190480XU Weijie, XIE Yongle, PENG Libiao, et al. SRAM based FPGA system capable of runtime fault tolerance and recovery[J]. Application of Electronic Technique, 2019, 45(7): 50–55. doi: 10.16157/j.issn.0258-7998.190480 [10] 张超, 赵伟, 刘峥. 基于FPGA的三模冗余容错技术研究[J]. 现代电子技术, 2011, 34(5): 167–171. doi: 10.3969/j.issn.1004-373X.2011.05.051ZHANG Chao, ZHAO Wei, and LIU Zheng. Research of TMR-based fault-tolerance techniques based on FPGA[J]. Modern Electronics Technique, 2011, 34(5): 167–171. doi: 10.3969/j.issn.1004-373X.2011.05.051 [11] NIKNAHAD M. Using Fine Grain Approaches for Highly Reliable Design of FPGA-Based Systems in Space[M]. Karlsruhe: KIT Scientific Publishing, 2013. [12] BENITES L A C. Automated design flow for applying triple modular redundancy in complex semi-custom digital integrated circuits[D]. [Master dissertation], Universidade Federal do Rio Grande Do Sul, 2018. [13] BENITES L A C and KASTENSMIDT F L. Automated design flow for applying Triple Modular Redundancy (TMR) in complex digital circuits[C]. 2018 IEEE 19th Latin-American Test Symposium (LATS), São Paulo, Brazil, 2018: 1–4. [14] BENITES L A C, BENEVENUTI F, DE OLIVEIRA Á B, et al. Reliability calculation with respect to functional failures induced by radiation in TMR arm cortex-M0 soft-core embedded into SRAM-based FPGA[J]. IEEE Transactions on Nuclear Science, 2019, 66(7): 1433–1440. doi: 10.1109/TNS.2019.2921796 [15] BENEVENUTI F, CHIELLE E, TONFAT J, et al. Experimental applications on SRAM-based FPGA for the NanosatC-BR2 scientific mission[C]. 2019 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), Rio de Janeiro, Brazil, 2019: 140–146. [16] BERG M and LABEL K A. Verification of triple modular redundancy (TMR) insertion for reliable and trusted systems[C]. Proceedings of the Government Microcircuit Applications & Critical Technology Conference, Orlando, USA, 2016. [17] PRATT B, WIRTHLIN M, CAFFREY M, et al. Improving FPGA reliability in harsh environments using triple modular redundancy with more frequent voting[C]. Proceedings of the Prentice Hall. Military and Aerospace FPGA Applications Conference, Palm Beach, USA, 2007. [18] CANNON M J. Improving the single event effect response of triple modular redundancy on SRAM FPGAs through placement and routing[D]. [Ph. D. dissertation], Brigham Young University, 2019. [19] ROWBERRY H C. A soft-error reliability testing platform for FPGA-based network systems[D]. [Master dissertation], Brigham Young University, 2019. [20] STODDARD A G. Configuration scrubbing architectures for high-reliability FPGA systems[D]. [Master dissertation], Brigham Young University, 2015. [21] 严健生, 杨柳青. 卫星用SRAM型FPGA抗单粒子翻转可靠性设计研究[J]. 科技创新与应用, 2021(9): 48–50,53.YAN Jiansheng and YANG Liuqing. Reliability design of anti-single event upset (SEU) of SRAM-FPGA for satellites[J]. Technology Innovation and Application, 2021(9): 48–50,53. [22] HERRERA-ALZU I and LOPEZ-VALLEJO M. Design techniques for Xilinx Virtex FPGA configuration memory scrubbers[J]. IEEE Transactions on Nuclear Science, 2013, 60(1): 376–385. doi: 10.1109/TNS.2012.2231881 [23] HOQUE K A. Early dependability analysis of FPGA-based space applications using formal verification[D]. [Ph. D. dissertation], Concordia University, 2016. [24] NAZAR G L, SANTOS L P, and CARRO L. Scrubbing unit repositioning for fast error repair in FPGAs[C]. 2013 International Conference on Compilers, Architecture and Synthesis for Embedded Systems, Montreal, Canada, 2013: 1–10. [25] NAZAR G L, SANTOS L P, and CARRO L. Fine-grained fast field-programmable gate array scrubbing[J]. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 2015, 23(5): 893–904. doi: 10.1109/TVLSI.2014.2330742 [26] ZHANG Rongsheng, XIAO Liyi, CAO Xuebing, et al. A fast scrubbing method based on triple modular redundancy for SRAM-based FPGAs[C]. 2018 14th IEEE International Conference on Solid-State and Integrated Circuit Technology (ICSICT), Qingdao, China, 2018: 1–3. [27] JOHNSON J M. Synchronization voter insertion algorithms for FPGA designs using triple modular redundancy[D]. [Master dissertation], Brigham Young University, 2010. [28] JOHNSON J M and WIRTHLIN M J. Voter insertion algorithms for FPGA designs using triple modular redundancy[C]. Proceedings of the 18th Annual ACM/SIGDA International Symposium on Field Programmable Gate Arrays, USA, 2010: 249–258. [29] KHATRI A R, HAYEK A, and BÖRCSÖK J. RASP-TMR: An automatic and fast synthesizable Verilog code generator tool for the implementation and evaluation of TMR approach[J]. International Journal of Advanced Computer Science and Applications, 2018, 9(8): 590–597. doi: 10.14569/IJACSA.2018.090875 [30] KHATRI A R. Overview of fault tolerance techniques and the proposed TMR generator tool for FPGA designs[J]. International Journal of Advanced Computer Science and Applications, 2020, 11(4): 749–753. doi: 10.14569/IJACSA.2020.0110497 [31] KULIS S. Single event effects mitigation with TMRG tool[J]. Journal of Instrumentation, 2017, 12: C01082. doi: 10.1088/1748-0221/12/01/C01082 [32] CERN. Triple Modular Redundancy Generator (TMRG)[EB/OL]. https://tmrg.web.cern.ch/tmrg/tmrg.pdf, 2020. [33] KULIS S. Single event upsets mitigation techniques[EB/OL]. https://indico.cern.ch/event/465343/attachments/1256299/1854682/tmrg_skulis_ep_ese.pdf, 2016. [34] Xilinx. Xilinx TMRTool Industry’s first triple modular redundancy development tool for re-configurable FPGAs[EB/OL]. https://www.xilinx.com/publications/prod_mktg/TRMTool-2015.pdf, 2015. [35] CARMICHAEL C. Triple module redundancy design techniques for Virtex FPGAs[EB/OL]. Xilinx Application Note XAPP197, https://china.xilinx.com/content/dam/xilinx/support/documents/application_notes/xapp197.pdf, 2001. [36] Xilinx. Xilinx TMRTool User Guide: TMRTool software Version 13.2[EB/OL]. https://www.xilinx.com/content/dam/xilinx/support/documents/user_guides/ug156-tmrtool.pdf, 2017. [37] WIRTHLIN M. The benefits of feedback TMR for SEU tolerance of SRAM FPGA designs[EB/OL]. https://indico.esa.int/event/130/contributions/723/attachments/781/958/ESA_SEFUW_TMR_March_2016-3.pdf, 2016. [38] ANWER J, PLATZNER M, and MEISNER S. FPGA redundancy configurations: An automated design space exploration[C]. 2014 IEEE International Parallel & Distributed Processing Symposium Workshops, Phoenix, USA, 2014: 275–280. [39] DANG Wansheng. FPGA radiation effects mitigation technology on logic synthesis[EB/OL]. 2020. [40] GRAPHICS M. Precision Hi-Rel synthesis software[EB/OL]. https://eda.sw.siemens.com/en-US/ic/precision/hi-rel/, 2018. [41] MERKELOV F. Design techniques for implementing highly reliable designs using FPGAs[EB/OL]. https://www.microsemi.com/document-portal/doc_view/132934-design-techniques-for-implementing-high-reliable-designs-using-microsemi-space-fpgas-russia-2013, 2013. [42] LEE G, AGIAKATSIKAS D, WU Tong, et al. TLegUp: A TMR code generation tool for SRAM-based FPGA applications using HLS[C]. 2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM), Napa, USA, 2017: 129–132. [43] BERNARDI M, CETIN E, and DIESSEL O. Correct high level synthesis of triple modular redundant user circuits for FPGAs[R]. UNSW-CSE-TR-201804, 2018. [44] AGIAKATSIKAS D. High-level synthesis of triple modular redundant FPGA circuits with energy efficient error recovery mechanisms[D]. [Ph. D. dissertation], University of New South Wales, 2019. [45] ZHU Zhiqi, TAHER F N, and SCHAFER B C. Exploring design trade-offs in fault-tolerant behavioral hardware accelerators[C]. Proceedings of the 2019 on Great Lakes Symposium on VLSI, Tysons Corner, USA, 2019: 291–294. [46] PARVIS M and AGNELLO M. High-energy physics fault tolerance metrics and testing methodologies for SRAM-based FPGAs[D]. [Master dissertation], Politecnico di Torino, 2018. [47] Xilinx. Microblaze triple modular redundancy(TMR) subsystem v1.0: Product guide[EB/OL]. https://www.xilinx.com/support/documentation/ip_documentation/tmr/v1_0/pg268-tmr.pdf, 2019.