OpenPARF: An Open-source Placement and Routing Framework for Large-scale Heterogeneous FPGAs with Deep Learning Toolkit
-
摘要: 该文提出一个面向大规模可编辑逻辑门阵列(FPGA)的开源布局布线框架OpenPARF。该框架基于深度学习工具包PyTorch实现,支持GPU大规模并行计算求解。在布局算法方面,该文设计了一种新型非对称多静电场系统,对FPGA布局问题进行建模。在布线算法方面,该文支持对FPGA可编程逻辑块(CLB)内部布线资源进行准确建模,并在大规模不规则布线资源图上进行布线,提高了异构FPGA芯片布线器的性能和效率。该文在ISPD 2016和2017 FPGA竞赛数据集和工业标准级FPGA数据集上进行了实验,结果表明该框架可减少0.4%~12.7%的布线线长,并实现两倍以上布局效率提升。
-
关键词:
- 集成电路设计与设计自动化 /
- 物理实现 /
- FPGA /
- 布局布线 /
- 机器学习
Abstract: An Open-source Placement And Routing Framework (OpenPARF) for large-scale FPGA physical design is proposed in this paper. OpenPARF is implemented with of deep learning toolkit PyTorch and supports GPU massive parallel acceleration. For placement, the framework incorporates a novel asymmetric multi-electrostatic filed system to model the FPGA placement problem. For routing, OpenPARF integrates finer-grained internal routing of FPGA Configurable Logic Blocks (CLBs) in the routing model and supports routing on large-scale irregular routing resource graph. This study can significantly improve the FPGA routing algorithm's efficiency and effectiveness. Experimental results on ISPD 2016 and ISPD 2017 FPGA conest benchmarks and industrial-level FPGA benchmarks demonstrate that OpenPARF can achieve 0.4%~12.7% improvement in routed wirelength and more than two times speedup in placement. -
表 1 ISPD2016 和 ISPD2017 benchmark 的实例数量和网表数量
ISPD 2016 ISPD 2017 FPGA设计 逻辑单元数量 网表数量 (k) FPGA设计 逻辑单元数量 网表数量 (k) LUT (k) FF (k) RAM DSP LUT (k) FF (k) RAM DSP FPGA01 50 55 0 0 105 CLK-FPGA01 211 324 164 75 536 FPGA02 100 66 100 100 167 CLK-FPGA02 230 280 236 112 511 FPGA03 250 170 600 500 428 CLK-FPGA03 410 481 850 395 898 FPGA04 250 172 600 500 430 CLK-FPGA04 309 372 467 224 685 FPGA05 250 174 600 500 433 CLK-FPGA05 393 469 798 150 865 FPGA06 350 352 1000 600 713 CLK-FPGA06 425 511 872 420 943 FPGA07 350 355 1000 600 716 CLK-FPGA07 254 309 313 149 565 FPGA08 500 216 600 500 725 CLK-FPGA08 212 257 161 75 470 FPGA09 500 366 1000 600 876 CLK-FPGA09 231 358 236 112 591 FPGA10 350 600 1000 600 961 CLK-FPGA10 327 506 542 255 837 FPGA11 480 363 1000 400 851 CLK-FPGA11 300 468 454 224 772 FPGA12 500 602 600 500 1111 CLK-FPGA12 277 430 389 187 710 – – – – – – CLK-FPGA13 339 405 570 262 749 表 2 在工业标准级 FPGA 数据集上的布局时间 (s)、布线时间(min)以及布线线长 (×103)
FPGA设计 逻辑单元数量 网表数量 布局时间 布线时间 布线线长 #LUT/#FF/#BRAM/#DSP #DiRAM+#SHIFT IND01 17k/11k/0/13 9 52492 72.36 10 90 IND02 11k/10k/0/24 6 26678 77.82 15 100 IND03 109k/12k/0/0 0 121554 109.54 108 1021 IND04 29k/17k/0/16 218 60968 69.39 19 283 IND05 64k/191k/64/928 29K 371808 126.38 109 2360 IND06 112k/65k/21/0 0 221182 88.28 176 1593 IND07 40k/156k/89/768 26K 294075 140.33 68 1450 表 3 在 ISPD 2016 benchmark 上的布局时间 (s)、布线时间 (min)以及布线线长 (×104) 的比较
FPGA设计 RippleFPGA (CPU) DREAMPlaceFPGA (GPU) OpenPARF (CPU) OpenPARF (GPU) 布局时间 布线时间 布线线长 布局时间 布线时间 布线线长 布局时间 布线时间 布线线长 布局时间 布线时间 布线线长 FPGA01 41.12 3 36.44 32.08 3 31.78 422 2 31.75 38.58 3 31.72 FPGA02 64.22 5 75.29 56.82 5 68.17 719 4 67.86 59.08 5 67.73 FPGA03 245.40 17 346.91 107.96 15 299.56 760 15 294.75 119.75 15 294.75 FPGA04 337.42 22 632.96 97.22 22 569.85 727 22 577.30 111.85 22 577.30 FPGA05 391.42 57 1222.46 90.60 56 1167.60 899 58 1148.40 122.78 54 1148.72 FPGA06 593.06 25 652.41 182.62 29 571.11 975 29 573.95 218.17 27 573.54 FPGA07 782.33 46 1106.96 159.07 51 964.44 924 46 966.37 208.84 45 965.24 FPGA08 489.97 40 958.26 146.33 36 911.67 921 38 895.47 184.13 38 896.91 FPGA09 737.86 58 1327.34 190.63 54 1203.49 1036 52 1198.43 259.76 50 1198.22 FPGA10 1179.94 28 711.48 179.50 28 544.52 1082 26 542.11 258.04 35 542.02 FPGA11 721.43 58 1281.65 147.88 56 1250.49 1020 59 1254.15 220.95 59 1253.78 FPGA12 883.19 40 761.37 183.95 37 674.21 1177 40 670.31 290.43 38 670.94 平均值 2.771 1.015 1.127 0.786 1.000 1.004 6.170 0.957 1.000 1.000 1.000 1.000 表 4 在ISPD 2017 benchmark上的布局时间 (s) 、布线时间 (min)以及布线线长 (×104 )的比较
FPGA设计 RippleFPGA (CPU) OpenPARF (CPU) OpenPARF (GPU) 布局时间 布线时间 布线线长 布局时间 布线时间 布线线长 布局时间 布线时间 布线线长 CLK-FPGA01 277.78 10 238.54 864 9 205.79 130.97 10 205.44 CLK-FPGA02 249.99 15 261.85 782 13 247.73 126.83 14 246.65 CLK-FPGA03 537.36 24 648.69 963 26 593.15 205.98 24 594.00 CLK-FPGA04 346.45 18 440.09 860 19 419.74 156.97 19 420.30 CLK-FPGA05 501.15 25 560.18 962 23 510.30 201.01 23 510.62 CLK-FPGA06 545.09 28 678.43 988 26 617.94 217.75 28 617.28 CLK-FPGA07 288.26 13 276.29 795 13 256.62 136.39 13 256.62 CLK-FPGA08 234.69 10 213.06 672 9 196.63 119.32 10 196.67 CLK-FPGA09 311.68 13 297.02 807 14 251.27 148.38 14 250.98 CLK-FPGA10 464.83 23 544.07 930 25 449.52 194.63 14 451.28 CLK-FPGA11 421.12 24 516.67 897 23 422.01 181.58 30 421.52 CLK-FPGA12 377.65 18 403.59 862 19 335.13 167.02 20 336.03 CLK-FPGA13 393.25 21 464.78 880 20 427.86 177.86 19 428.41 平均值 2.251 1.037 1.125 5.305 1.036 1.000 1.000 1.000 1.000 -
[1] MARKOV I L, HU Jin, and KIM M C. Progress and challenges in VLSI placement research[J]. Proceedings of the IEEE, 2015, 103(11): 1985–2003. doi: 10.1109/JPROC.2015.2478963 [2] CHEN Deming, CONG J, and PAN Peichan. FPGA design automation: A survey[M]. BOX P O. Foundations and Trends in Electronic Design Automation. Hanover: Now Publishers Inc. , 2006. [3] MURRAY K E, WHITTY S, LIU Suya, et al. Timing-driven Titan: Enabling large benchmarks and exploring the gap between academic and commercial CAD[J]. ACM Transactions on Reconfigurable Technology and Systems, 2015, 8(2): 1–18. doi: 10.1145/2629579 [4] MURRAY K E, PETELIN O, ZHONG Suya, et al. VTR 8: High-performance CAD and customizable FPGA architecture modelling[J]. ACM Transactions on Reconfigurable Technology and Systems, 2020, 13(2): 1–55. doi: 10.1145/3388617 [5] UltraScale architecture configurable logic block user guide (UG574)[EB/OL].https://china.xilinx.com/content/dam/xilinx/support/documents/user_guides/ug574-ultrascale-clb.pdf, 2017. [6] YANG S, GAYASEN A, MULPURI C, et al. Routability-driven FPGA placement contest[C]. The 2016 on International Symposium on Physical Design, Santa Rosa, USA, 2016: 139–143. [7] ZHANG Niansong, CHEN Xiang, and KAPRE N. RapidLayout: Fast hard block placement of FPGA-optimized systolic arrays using evolutionary algorithm[J]. ACM Transactions on Reconfigurable Technology and Systems, 2022, 15(4): 38. doi: 10.1145/3501803 [8] ZHOU Yun, MAIDEE P, LAVIN C, et al. RWRoute: An open-source timing-driven router for commercial FPGAs[J]. ACM Transactions on Reconfigurable Technology and Systems, 2022, 15(1): 8. doi: 10.1145/3491236 [9] YANG S, MULPURI C, REDDY S, et al. Clock-aware FPGA placement contest[C]. The 2017 ACM on International Symposium on Physical Design, Portland, USA, 2017: 159–164. [10] MARTIN T, BARNES C, AREIBI S, et al. An adaptive sequential decision making flow for FPGAs using machine learning[C]. 2022 International Conference on Microelectronics (ICM), Casablanca, Morocco, 2022: 34–37. [11] LIANG Tingyuan, CHEN Gengjie, ZHAO Jieru, et al. AMF-placer: High-performance analytical mixed-size placer for FPGA[C]. 2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD), Munich, Germany, 2021: 1–9. [12] CHEN T C, JIANG Zhewei, HSU T C, et al. NTUplace3: An analytical placer for large-scale mixed-size designs with preplaced blocks and density constraints[J]. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2008, 27(7): 1228–1240. doi: 10.1109/TCAD.2008.923063 [13] LU Jingwei, CHEN Pengwen, CHANG C C, et al. ePlace: Electrostatics based placement using Nesterov’s method[C]. The the 51st Annual Design Automation Conference, San Francisco, USA, 2014: 1–6. [14] CHENG C K, KAHNG A B, KANG I, et al. RePlAce: Advancing solution quality and routability validation in global placement[J]. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2019, 38(9): 1717–1730. doi: 10.1109/TCAD.2018.2859220 [15] LIU Lixin, FU Bangqi, WONG M D F, et al. Xplace: An extremely fast and extensible global placement framework[C]. The 59th ACM/IEEE Design Automation Conference, San Francisco, USA, 2022: 1309–1314. [16] AGNESINA A, RAJVANSHI P, YANG Tian, et al. AutoDMP: Automated DREAMPlace-based macro placement[C]. The 2023 International Symposium on Physical Design, Virtual Event, USA, 2023: 149–157. [17] MAI Jing, MENG Yibai, DI Zhixiong, et al. Multi-electrostatic FPGA placement considering SLICEL-SLICEM heterogeneity and clock feasibility[C]. The 59th ACM/IEEE Design Automation Conference, San Francisco, USA, 2022: 649–654. [18] RAJARATHNAM R S, ALAWIEH M B, JIANG Zixuan, et al. DREAMPlaceFPGA: An open-source analytical placer for large scale heterogeneous FPGAs using deep-learning toolkit[C]. 2022 27th Asia and South Pacific Design Automation Conference (ASP-DAC), Taipei, China, 2022: 300–306. [19] RAJARATHNAM R S, JIANG Zixuan, IYER M A, et al. DREAMPlaceFPGA-PL: An open-source GPU-accelerated packer-legalizer for heterogeneous FPGAs[C]. The 2023 International Symposium on Physical Design, Virtual Event, USA, 2023: 175–184. [20] MENG Yibai, LI Wuxi, LIN Yibo, et al. elfPlace: Electrostatics-based placement for large-scale heterogeneous FPGAs[J]. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2022, 41(1): 155–168. doi: 10.1109/TCAD.2021.3053191 [21] KIM M C, HU Jin, LEE D J, et al. A SimPLR method for routability-driven placement[C]. 2011 IEEE/ACM International Conference on Computer-Aided Design (ICCAD), San Jose, USA, 2011: 67–73. [22] LI Wuxi, DHAR S, and PAN D Z. UTPlaceF: A routability-driven FPGA placer with physical and congestion aware packing[C]. The 2016 IEEE/ACM International Conference on Computer-Aided Design, Austin, USA, 2016: 1–7. [23] CHEN Gengjie, PUI C W, CHOW W K, et al. RippleFPGA: Routability-driven simultaneous packing and placement for modern FPGAs[J]. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2018, 37(10): 2022–2035. doi: 10.1109/TCAD.2017.2778058 [24] HSU M K, CHOU S, LIN T H, et al. Routability-driven analytical placement for mixed-size circuit designs[C]. 2011 IEEE/ACM International Conference on Computer-Aided Design (ICCAD), San Jose, USA, 2011: 80–84. [25] CHEN Jianli, LIN Zhifeng, KUO Y C, et al. Clock-aware placement for large-scale heterogeneous FPGAs[J]. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2020, 39(12): 5042–5055. doi: 10.1109/TCAD.2020.2968892 [26] PUI C W, CHEN Gengjie, MA Yuzhe, et al. Clock-aware ultrascale FPGA placement with machine learning routability prediction: (Invited paper)[C]. 2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD), Irvine, USA, 2017: 929–936. [27] LI Wuxi, LIN Yibo, LI Meng, et al. UTPlaceF 2.0: A high-performance clock-aware FPGA placement engine[J]. ACM Transactions on Design Automation of Electronic Systems, 2018, 23(4): 42. doi: 10.1145/3174849 [28] LI Wuxi, DEHKORDI M E, YANG S, et al. Simultaneous placement and clock tree construction for modern FPGAs[C]. The 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, Seaside, USA, 2019: 132–141. [29] ZHU Ziran, MEI Yangjie, LI Zijun, et al. High-performance placement for large-scale heterogeneous FPGAs with clock constraints[C]. The 59th ACM/IEEE Design Automation Conference, San Francisco, USA, 2022: 643–648. [30] FENG Wenyi. K-way partitioning based packing for FPGA logic blocks without input bandwidth constraint[C]. 2012 International Conference on Field-Programmable Technology, Seoul, Korea (South), 2012: 8–15. [31] BETZ V and ROSE J. VPR: A new packing, placement and routing tool for FPGA research[M]. LUK W, CHEUNG P Y K and GLESNER M. Field-Programmable Logic and Applications. Berlin, Heidelberg: Springer, 1997: 213–222. [32] LI Wuxi and PAN D Z. A new paradigm for FPGA placement without explicit packing[J]. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2019, 38(11): 2113–2126. doi: 10.1109/TCAD.2018.2877017 [33] DI Zhixiong, TAO Runzhe, CHEN Lin, et al. Imbalanced large graph learning framework for FPGA logic elements packing prediction[EB/OL]. Available: http: //arxiv. org/abs/2308. 03231, 2023. [34] MCMURCHIE L and EBELING C. PathFinder: A negotiation-based performance-driven router for FPGAs[C]. Third International ACM Symposium on Field-Programmable Gate Arrays, Napa Valley, USA, 1995: 111–117. [35] MURRAY K E, ZHONG Sheng, and BETZ V. AIR: A fast but lazy timing-driven FPGA router[C]. 2020 25th Asia and South Pacific Design Automation Conference (ASP-DAC), Beijing, China, 2020: 338–344. [36] ZHOU Yun, VERCRUYCE D, and STROOBANDT D. Accelerating FPGA routing through algorithmic enhancements and connection-aware parallelization[J]. ACM Transactions on Reconfigurable Technology and Systems, 2020, 13(4): 1–26. doi: 10.1145/3406959 [37] ZHA Yue and LI Jing. Revisiting pathfinder routing algorithm[C]. Proceedings of the 2022 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, Virtual Event, USA, 2022: 24–34. [38] SHEN Minghua and LUO Guojie. Corolla: GPU-accelerated FPGA routing based on subgraph dynamic expansion[C]. The 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, Monterey, USA, 2017: 105–114. [39] WANG Jiarui, MAI Jing, DI Zhixiong, et al. A Robust FPGA router with concurrent intra-CLB rerouting[C]. The 28th Asia and South Pacific Design Automation Conference, Tokyo, Japan, 2023: 529–534. [40] RAY B N B, TRIPATHY A R, SAMAL P, et al. Half-perimeter wirelength model for VLSI analytical placement[C]. 2014 International Conference on Information Technology, Bhubaneswar, India, 2014: 287–292. [41] SPINDLER P and JOHANNES F M. Fast and accurate routing demand estimation for efficient routability-driven placement[C]. 2007 Design Automation & Test in Europe Conference & Exhibition, Nice, France, 2007: 1–6.