FPGA Hybrid PLB Architecture for Highly Efficient Resource Utilization

WANG Yanlin; GAO Lijiang; YANG Haigang

doi:10.11999/JEIT260108

Article Contents

Article Navigation > Journal of Electronics & Information Technology > 2026 >

WANG Yanlin, GAO Lijiang, YANG Haigang. FPGA Hybrid PLB Architecture for Highly Efficient Resource Utilization[J]. Journal of Electronics & Information Technology. doi: 10.11999/JEIT260108

Citation:

WANG Yanlin, GAO Lijiang, YANG Haigang. FPGA Hybrid PLB Architecture for Highly Efficient Resource Utilization[J]. Journal of Electronics & Information Technology. doi: 10.11999/JEIT260108

WANG Yanlin, GAO Lijiang, YANG Haigang. FPGA Hybrid PLB Architecture for Highly Efficient Resource Utilization[J]. Journal of Electronics & Information Technology. doi: 10.11999/JEIT260108

Citation:

WANG Yanlin, GAO Lijiang, YANG Haigang. FPGA Hybrid PLB Architecture for Highly Efficient Resource Utilization[J]. Journal of Electronics & Information Technology. doi: 10.11999/JEIT260108

PDF( 2094 KB)

FPGA Hybrid PLB Architecture for Highly Efficient Resource Utilization

doi: 10.11999/JEIT260108 cstr: 32379.14.JEIT260108

WANG Yanlin^{1, 2},
GAO Lijiang³,
YANG Haigang^{2
,
,}

1.
Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100094, China
2.
University of Chinese Academy of Sciences, Beijing 100094, China
3.
Beijing Zhongke Shengxin Technology Co., Ltd., Beijing 100081, China

Funds: The National Natural Science Foundation of China (61876172)

Accepted Date: 2026-02-14
Rev Recd Date: 2026-02-14

Available Online: 2026-03-04

Abstract

Abstract

6-input look-up tables (LUTs) are frequently used in commercial Field-Programmable Gate Arrays (FPGAs) to build programmable logic blocks, while related experiments reveal that their average application in circuits is less than 30%, resulting in a significant waste of programmable resources. In this paper, the 6-input LUTs are fractured based on fracturable factors and recombined with different granularities to construct several new Hybrid Basic Logic Elements (HBLE). Based on HBLE, several novel Hybrid Programmable Logic Block (HPLB) architectures are proposed. Then the Programmable Logic Blocks (PLB) of Xilinx is replaced by several innovative HPLB architectures. Concurrently, a statistical evaluation algorithm for the mapped netlist is proposed. Finally, several HPLB architectures are experimentally verified and evaluated as appropriate. Experimental evaluations of the three enhanced architectures show that the HPLBs achieve an average area reduction of more than 30% when compared to Xilinx’s PLBs without adding more input ports. The hybrid HPLB architectures constructed with a fracturable factor N=3 produces the best optimization results when taking into account both HPLB utilization and area optimization. Based on the MCNC and VTR benchmarks, resource consumption increased by an average of 8.27% and 27.64%, respectively, thereby improving FPGA logic efficiency. Objective Currently, modern commercial FPGA architectures employ 6-LUTs as the fundamental building blocks for Basic Logic Elements (BLEs). Only about 30% of the Logic Elements (LEs) in the circuit are ultimately translated to 6-LUTs when mapping 6-LUT BLEs, according to experimental results. Nevertheless, more than half of the logic resources are wasted when 6-LUTs implement functions with inputs smaller than 6. Programmable resources will unavoidably be significantly wasted as a result. A circuit design mapped to 100 4-LUTs can be mapped to 78 6-LUTs during 6-LUT mapping studies, according to experimental data, with the {6,5,4,3,2}-LUT function distribution being {23,32,17,9,13}. The findings indicate that only around 25% of the 6-LUTs are ultimately mapped to 6-input functions, with the remaining 6-LUTs being underutilized. This illustrates even more how inefficient technical mapping is for LUTs with large input K.Methods The fracturable factor N, which is the number of sub-LUTs that may be obtained from a single LUT, characterizes the fracturable and reconfigurable nature of LUT architectures in FPGAs. Motivated by this, we decompose a 6-LUT into several granularities according to the fracturable factor in order to address the previously described problem of low resource utilization. Three novel hybrid-granularity divisible logic (HBLE) structures are created by connecting and reconfiguring the resultant sub-LUTs with additional input ports and multiplexer modules. We shall now investigate how FPGA performance is optimized by these three HBLE topologies. We shall now investigate how FPGA performance is optimized by these three HBLE topologies. One undivided 6-LUT and one divisible 6-LUT, divided into two 5-LUTs with a divisibility factor N=2, make up the HBLE2 structure. One undivided 6-LUT and one divisible 6-LUT, divided into one 5-LUT and two 4-LUTs, with a divisibility factor N=3, are included in the HBLE3 structure. One undivided 6-LUT and one divisible 6-LUT, which divides into four 4-LUTs with a divisibility factor N=4, make up the HBLE4 structure. Adder units are supported by all three HBLE structures, allowing for both latched and direct combinational logic output. Additionally, they allow direct latched output by avoiding combinational logic. A Hybrid Programmable Logic Block (HPLB) is a novel structure created by merging several HBLEs. The MCNC circuit set and the VTR circuit set, the two most well-known academic circuit benchmarks (BMs), are chosen for experimental assessment. A Xilinx Virtex-7 FPGA is used to map each circuit set. The mapped netlist is then used to tally the kinds and numbers of LUTs that were utilized. The minimum number of CLBs needed is found once the data has been arranged using the corresponding greedy algorithms. Since each Xilinx CLB has eight 6-LUTs, the greedy approach uses # Total LUT Number / 8 to determine the smallest number of CLBs needed following BM mapping. In order to guarantee similar conditions, each structure also needs to be sorted using the greedy algorithm after Xilinx’s CLB structure is replaced with the HPLB structure suggested in this research. This results in the bare minimum of HPLBs needed. It is not possible to use every LUT in the mapped CLBs during actual packing owing to routing constraints. As a result, the smallest value that may be achieved in a theoretical optimization scenario is represented by the optimized result that is acquired following greedy algorithm restructuring. Results and Discussions The average number of HPLBs needed for both HPLB2 and HPLB3 structures drops by about 8% when CLB structures are swapped out for HPLBs in order to map the MCNC circuit set. However, the number of HPLBs needed increases by more than 30% on average as a result of the HPLB4 structure. The needed count is smaller when HPLBs are used in place of CLBs for mapping the VTR circuit set. On average, the HPLB2 and HPLB4 counts drop by less than 10%, whereas the HPLB3 count drops by around 30%. This enables SRAM scheduling and complete input pin use. On the other hand, because of resource waste, the uniform CLB structure results in higher CLB requirements when implementing functions with a tiny LUT input K. The HPLB4 structure performs worse than the HPLB3 structure, according to post-mapping HPLB counts. Both the MCNC and VTR circuit sets achieve average area reduction ratios over 30%, according to analysis of post-mapping area optimization. All three HPLB structures attained area optimization ratios of about 31% on the MCNC test set. Different optimization effects were seen in the VTR test circuit set: HPLB2 produced an average area reduction of 30.63%, whereas HPLB4 produced an average decrease of 51.21%. The HPLB2 structure produced a 45.22% area reduction, even though its optimization effect was marginally less than that of HPLB4. A thorough examination of the area optimization results showed that a higher divisibility factor N produces more noticeable benefits for integrating small-scale LUTs in circuits, resulting in higher area reduction ratios from the enhanced architectures. Conclusions In order to solve the issue of low resource utilization in 6-LUTs, this research proposes three split granularity-based HPLB enhancement architectures. In addition to establishing an assessment procedure and matching algorithms for the enhanced structures, these HPLBs take the place of Xilinx’s CLB structure in order to examine the new structure’s benefits in resource utilization. Based on the proportion differences of different LUTs in the post-mapping netlist, evaluation experiments using the MCNC and VTR circuit test suites show that, although HPLB4 achieves significant area optimization, it requires additional HPLBs, resulting in increased interconnect area. While both HPLB2 and HPLB3 structures obtain average area optimizations over 30%, HPLB3 produces a significantly greater HPLB count and area optimization than HPLB2 as the test circuit scale grows. Thus, after replacing the CLB structure, the HPLB3 structure provides a more balanced optimization impact, greatly improving the utilization of programmable resources when taking into account the combined aspects of HPLB usage count and area optimization.
- Field Programmable Gate Array,
- Programmable Logic Block,
- Look-up Table,
- Mapping,
- Fracturable factor

FullText(HTML)

References(19)

References

[1]	BETZ V, ROSE J, and MARQUARDT A. Architecture and CAD for Deep-Submicron FPGAs[M]. New York: Springer, 1999: 127–150. doi: 10.1007/978-1-4615-5145-4.
[2]	JIANG Xun, WANG Jiarui, MAI Jing, et al. A robust FPGA router with optimization of high-fanout nets and intra-CLB connections[J]. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2025, 44(3): 1003–1016. doi: 10.1109/TCAD.2024.3447218.
[3]	DAHIYA S. Area and delay trade offs in fracturable LUT-based FPGA architectures[J]. Journal of Integrated Science and Technology, 2024, 12(2): 733–733. (查阅网上资料, 未找到本条文献信息, 请确认).
[4]	KUMARI J L V R, KUMAR V K, ABHIGNYA M, et al. Design and performance analysis of configurable logic block (CLB) for FPGA using various circuit topologies[C]. 2024 3rd International Conference for Innovation in Technology (INOCON), Bangalore, India, 2024: 1–5. doi: 10.1109/INOCON60754.2024.10511683.
[5]	PUN J, DAI X, ZGHEIB G, et al. Double duty: FPGA architecture to enable concurrent LUT and adder chain usage[J]. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 2025, 33(2): 412–425. doi: 10.1109/TVLSI.2024.3512345. (查阅网上资料,未找到本条文献信息且doi打不开,请确认).
[6]	GUO Yi, ZHOU Qilin, CHEN Xiu, et al. High-efficiency FPGA - based approximate multipliers with LUT sharing and carry switching[C]. 2024 Design, Automation & Test in Europe Conference & Exhibition (DATE), Valencia, Spain, 2024: 1–2. doi: 10.23919/DATE58400.2024.10546667.
[7]	XIE Yanyue, LI Zhengang, DIACONU D, et al. LUTMUL: Exceed conventional FPGA roofline limit by LUT-based efficient multiplication for neural network inference[C]. Proceedings of the 30th Asia and South Pacific Design Automation Conference, Tokyo, Japan, 2024: 713–719. doi: 10.1145/3658617.3697687.
[8]	Xilinx Inc. 7 series FPGAs configurable logic block[EB/OL]. https://www.xilinx.com/support/documentation/user_guides/ug474_7Series_CLB.pdf, 2016. (查阅网上资料,请核对网址与文献是否相符).
[9]	HUTTON M, SCHLEICHER J, LEWIS D, et al. Improving FPGA performance and area using an adaptive logic module[C]. Proceedings of the 14th International Conference on Field Programmable Logic and Application, Leuven, Belgium, 2004: 135–144. doi: 10.1007/978-3-540-30117-2_16.
[10]	徐宇, 林郁, 江政泓, 等. 拆分粒度对FPGA可拆分逻辑结构性能的影响[J]. 太赫兹科学与电子信息学报, 2017, 15(2): 307–312. doi: 10.11805/TKYDA201702.0307. XU Yu, LIN Yu, JIANG Zhenghong, et al. Influences of fracturable factor on FPGA performance[J]. Journal of Terahertz Science and Electronic Information Technology, 2017, 15(2): 307–312. doi: 10.11805/TKYDA201702.0307.
[11]	ROSE J, EL GAMAL A, and SANGIOVANNI-VINCENTELLI A. Architecture of field-programmable gate arrays[J]. Proceedings of the IEEE, 1993, 81(7): 1013–1029. doi: 10.1109/5.231340.
[12]	AHMED E and ROSE J. The effect of LUT and cluster size on deep-submicron FPGA performance and density[J]. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 2004, 12(3): 288–298. doi: 10.1109/TVLSI.2004.824300.
[13]	HE Jianshe. Technology mapping and architecture of heterogeneous field-programmable gate arrays[D]. [Master dissertation], University of Toronto, 1993.
[14]	CONG J and XU Songjie. Delay-optimal technology mapping for FPGAs with heterogeneous LUTs[C]. Proceedings of the 35th Design and Automation Conference, San Francisco, USA, 1998: 704–707. doi: 10.1145/277044.277221.
[15]	DAHIYA S. Evaluating the impact of cluster parameters on FPGA performance and density[J]. Journal of Integrated Science and Technology, 2023, 11(3): 520. doi: 10.31083/j.jist1130520. (查阅网上资料,未找到本条文献信息且doi打不开,请确认).
[16]	SHI Xinyu, YANG Moucheng, LI Zhen, et al. Exploration of FPGA PLB architecture base on LUT and microgates[C]. 2023 International Symposium of Electronics Design Automation (ISEDA), Nanjing, China, 2023: 184–189. doi: 10.1109/ISEDA59274.2023.10218468.
[17]	SUDHANYA P and JOY VASANTHA RANI S P. Analysis of FPGA architecture with hybrid logic blocks based on ULG and LUT[J]. Journal of Circuits, Systems and Computers, 2025, 34(2): 2550059. doi: 10.1142/S0218126625500598.
[18]	高丽江, 杨海钢, 李威, 等. 具有高资源利用率特征的改进型查找表电路结构与优化方法[J]. 电子与信息学报, 2019, 41(10): 2382–2388. doi: 10.11999/JEIT190095. GAO Lijiang, YANG Haigang, LI Wei, et al. A circuit optimization method of improved lookup table for highly efficient resource utilization[J]. Journal of Electronics & Information Technology, 2019, 41(10): 2382–2388. doi: 10.11999/JEIT190095.
[19]	GARCÍA A. Greedy algorithms: A review and open problems[J]. Journal of Inequalities and Applications, 2025, 2025(1): 11. doi: 10.1186/s13660-025-03254-1.