高级搜索

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

具有高资源利用率的FPGA混合PLB结构

王彦林 高丽江 杨海钢

王彦林, 高丽江, 杨海钢. 具有高资源利用率的FPGA混合PLB结构[J]. 电子与信息学报. doi: 10.11999/JEIT260108
引用本文: 王彦林, 高丽江, 杨海钢. 具有高资源利用率的FPGA混合PLB结构[J]. 电子与信息学报. doi: 10.11999/JEIT260108
WANG Yanlin, GAO Lijiang, YANG Haigang. FPGA Hybrid PLB Architecture for Highly Efficient Resource Utilization[J]. Journal of Electronics & Information Technology. doi: 10.11999/JEIT260108
Citation: WANG Yanlin, GAO Lijiang, YANG Haigang. FPGA Hybrid PLB Architecture for Highly Efficient Resource Utilization[J]. Journal of Electronics & Information Technology. doi: 10.11999/JEIT260108

具有高资源利用率的FPGA混合PLB结构

doi: 10.11999/JEIT260108 cstr: 32379.14.JEIT260108
基金项目: 国家自然科学基金(61876172)
详细信息
    作者简介:

    王彦林:男,博士生,研究方向为可编程芯片结构设计与FPGA CAD设计

    高丽江:男,博士,研究方向为可编程芯片结构设计与FPGA CAD设计

    杨海钢:男,研究员,博士生导师,研究方向为大规模集成电路设计、电子设计自动化(EDA)技术

    通讯作者:

    杨海钢 yanghg@mail.ie.ac.cn

  • 中图分类号: TN047

FPGA Hybrid PLB Architecture for Highly Efficient Resource Utilization

Funds: The National Natural Science Foundation of China (61876172)
  • 摘要: 商用现场可编程门阵列(FPGA)普遍采用6输入查找表(LUT)构建可编程逻辑块.而相关实验表明6输入LUT在电路中的应用平均不超过30%,造成了严重的可编程资源浪费。该文在可拆分因子概念基础上将6输入LUT进行不同粒度拆分并进行重新组合,构建出三种新的混合粒度可编程逻辑单元;然后基于混合粒度可编程逻辑单元组合成三种新的混合可编程逻辑块结构用以替换Xilinx的可编程逻辑块;同时提出了一种对映射后网表进行统计的优化评估算法;最后对三种改进结构进行相应实验验证和评估。结果表明:在不增加输入端口资源的情况下,三种混合粒度可编程逻辑块对Xilinx可编程逻辑块结构替换后面积优化平均超过30%;综合PLB使用数量和面积优化来看,可拆分因子N=3时候构建的混合可编程逻辑块结构优化效果最好,在MCNC电路集和VTR电路集下,资源利用率平均分别提高了8.27%和27.64%,有效提升了FPGA的资源利用率。
  • 图  1  基于BLE的PLB结构示意图

    图  2  4-LUT级联成5-LUT和6-LUT结构图

    图  3  Xilinx CLB及SLICE结构示意图

    图  4  (a) HBLE2 结构 (N=2) (b) HBLE3结构 (N=3) (c) HBLE4结构 (N=4)

    图  5  三种不同HBLE组成的HPLB结构

    图  6  结构评估方案流程图

    图  7  HPLB替换CLB后在MCNC和VTR电路测试集上性能优化比例

    表  1  贪心算法优化HPLB2替换Virtex-7 CLB结构后HPLB数量统计

     输入:netlist_lut_set:BMs电路映射后网表中统计出来的所有类型LUT对应数量的集合;
     lut5_set/lut6_set:netlist_lut中5-LUT/6-LUT的数量;
     slut4_set:netlist_lut中K-LUT (K=1,2,3,4)的数量之和;
     输出:ple:netlist_lut中用贪心算法计算出来的所需的PLE的数目(向上取整);
     1: read(netlist_lut_set);
     2: stop←max (slut4_set/6,lut5_set/3,lut6_set/3);
     3: for ple←1 to stop do
     4:  x←lut6_set-ple*3;
     5:  if x≥0 then
     6:   y←ple*3-x-lut_set5;
     7:   if y<0 then
     8:    z←ple*6-x*2+y*2;
     9:    result←z-slut4;
     10:    if result≥0 then
     11:     write(ple);
     12:   endif
     13: else
     14:   z←ple*6-x*2+y;
     15:   result←z-slut_set4;
     16:   if result≥0 then
     17:    write(ple);
     18:   else
     19:    y←ple*3 -x-lut_set5;
     20:    if y<0 then
     21:     z←ple*6+y*2;
     22:     result←z-lut4_set;
     23:     if result≥0 then
     24:      write(ple);
     25:   endif
     26:   else
     27:    z←ple*6+y;
     28:    result←z-slut4_set;
     29:    if result≥0 then
     30:     write(ple);
     31:    endif
     32:   endif
     33: endif
     34: endif
     35: endif
     36: endfor
    下载: 导出CSV

    表  2  CLB和HPLB结构对MCNC和VTR电路测试集映射后所需PLB数目

    MCNC BM#Xilinx CLB#HPLB2#HPLB3#HPLB4VTR BM#Xilinx CLB#HPLB2#HPLB3#HPLB4
    alu433303553bgm1565139112431856
    apex231282943blob_merge778692519639
    apex420262639boundtop153136102131
    bigkey867789129ch_intrinsics4434
    clma23211927diffeq162554150
    des56504869diffeq233302227
    diffeq58525480LU32PEEng77695273
    dsip86776486LU64PEEng84755780
    elliptic201179182272LU8PEEng68615072
    ex101022292943mcml6599586643996036
    ex5p13121218mkDelayWorker32B303269248371
    frisc219195180270mkPktMerge4433
    misex323202335mkSMAdapter4B138123112165
    pdc21191927raygentop138123110159
    s2983223sha164146109139
    s38417182162153224spree72645988
    s38584243216211317stereovision0397353265317
    seq888193139stereovision12070184115852232
    spla20181929stereovision211811050788945
    tseng84755780stereovision3109811
    几何平均值75.668.4567.299.15几何平均值695618.05488.75669.9
    优化比例-8.09%8.27%-34.97%优化比例-9.78%27.64%3.62%
    下载: 导出CSV

    表  3  三种HPLB结构优化效果及结构特点对比你

    架构 BMs 平均HPLB数量优化(%) BMs平均面积优化(%) 结构特点
    HPLB2 8.94 31.27 ① HPLB2 Tile端口数量与Xilinx CLB接近
    ② 面积优化比例超过30%
    ③ HPLB3数量优化比例不到10%
    HPLB3 18.53 38.32 ① HPLB数量优化效果最好,接近20%
    ② 面积优化较高,接近40%
    ③ HPLB3 Tile端口数量最多
    HPLB4 –14.05 42.26 ① 面积优化效果最好,超过40%
    ② 每个HPLB中只包含两个HBLE, Tile端口数量最少
    ③ HPLB数量增加超过10%
    下载: 导出CSV
  • [1] BETZ V, ROSE J, and MARQUARDT A. Architecture and CAD for Deep-Submicron FPGAs[M]. New York: Springer, 1999: 127–150. doi: 10.1007/978-1-4615-5145-4.
    [2] JIANG Xun, WANG Jiarui, MAI Jing, et al. A robust FPGA router with optimization of high-fanout nets and intra-CLB connections[J]. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2025, 44(3): 1003–1016. doi: 10.1109/TCAD.2024.3447218.
    [3] DAHIYA S. Area and delay trade offs in fracturable LUT-based FPGA architectures[J]. Journal of Integrated Science and Technology, 2024, 12(2): 733–733. (查阅网上资料, 未找到本条文献信息, 请确认).
    [4] KUMARI J L V R, KUMAR V K, ABHIGNYA M, et al. Design and performance analysis of configurable logic block (CLB) for FPGA using various circuit topologies[C]. 2024 3rd International Conference for Innovation in Technology (INOCON), Bangalore, India, 2024: 1–5. doi: 10.1109/INOCON60754.2024.10511683.
    [5] PUN J, DAI X, ZGHEIB G, et al. Double duty: FPGA architecture to enable concurrent LUT and adder chain usage[J]. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 2025, 33(2): 412–425. doi: 10.1109/TVLSI.2024.3512345. (查阅网上资料,未找到本条文献信息且doi打不开,请确认).
    [6] GUO Yi, ZHOU Qilin, CHEN Xiu, et al. High-efficiency FPGA - based approximate multipliers with LUT sharing and carry switching[C]. 2024 Design, Automation & Test in Europe Conference & Exhibition (DATE), Valencia, Spain, 2024: 1–2. doi: 10.23919/DATE58400.2024.10546667.
    [7] XIE Yanyue, LI Zhengang, DIACONU D, et al. LUTMUL: Exceed conventional FPGA roofline limit by LUT-based efficient multiplication for neural network inference[C]. Proceedings of the 30th Asia and South Pacific Design Automation Conference, Tokyo, Japan, 2024: 713–719. doi: 10.1145/3658617.3697687.
    [8] Xilinx Inc. 7 series FPGAs configurable logic block[EB/OL]. https://www.xilinx.com/support/documentation/user_guides/ug474_7Series_CLB.pdf, 2016. (查阅网上资料,请核对网址与文献是否相符).
    [9] HUTTON M, SCHLEICHER J, LEWIS D, et al. Improving FPGA performance and area using an adaptive logic module[C]. Proceedings of the 14th International Conference on Field Programmable Logic and Application, Leuven, Belgium, 2004: 135–144. doi: 10.1007/978-3-540-30117-2_16.
    [10] 徐宇, 林郁, 江政泓, 等. 拆分粒度对FPGA可拆分逻辑结构性能的影响[J]. 太赫兹科学与电子信息学报, 2017, 15(2): 307–312. doi: 10.11805/TKYDA201702.0307.

    XU Yu, LIN Yu, JIANG Zhenghong, et al. Influences of fracturable factor on FPGA performance[J]. Journal of Terahertz Science and Electronic Information Technology, 2017, 15(2): 307–312. doi: 10.11805/TKYDA201702.0307.
    [11] ROSE J, EL GAMAL A, and SANGIOVANNI-VINCENTELLI A. Architecture of field-programmable gate arrays[J]. Proceedings of the IEEE, 1993, 81(7): 1013–1029. doi: 10.1109/5.231340.
    [12] AHMED E and ROSE J. The effect of LUT and cluster size on deep-submicron FPGA performance and density[J]. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 2004, 12(3): 288–298. doi: 10.1109/TVLSI.2004.824300.
    [13] HE Jianshe. Technology mapping and architecture of heterogeneous field-programmable gate arrays[D]. [Master dissertation], University of Toronto, 1993.
    [14] CONG J and XU Songjie. Delay-optimal technology mapping for FPGAs with heterogeneous LUTs[C]. Proceedings of the 35th Design and Automation Conference, San Francisco, USA, 1998: 704–707. doi: 10.1145/277044.277221.
    [15] DAHIYA S. Evaluating the impact of cluster parameters on FPGA performance and density[J]. Journal of Integrated Science and Technology, 2023, 11(3): 520. doi: 10.31083/j.jist1130520. (查阅网上资料,未找到本条文献信息且doi打不开,请确认).
    [16] SHI Xinyu, YANG Moucheng, LI Zhen, et al. Exploration of FPGA PLB architecture base on LUT and microgates[C]. 2023 International Symposium of Electronics Design Automation (ISEDA), Nanjing, China, 2023: 184–189. doi: 10.1109/ISEDA59274.2023.10218468.
    [17] SUDHANYA P and JOY VASANTHA RANI S P. Analysis of FPGA architecture with hybrid logic blocks based on ULG and LUT[J]. Journal of Circuits, Systems and Computers, 2025, 34(2): 2550059. doi: 10.1142/S0218126625500598.
    [18] 高丽江, 杨海钢, 李威, 等. 具有高资源利用率特征的改进型查找表电路结构与优化方法[J]. 电子与信息学报, 2019, 41(10): 2382–2388. doi: 10.11999/JEIT190095.

    GAO Lijiang, YANG Haigang, LI Wei, et al. A circuit optimization method of improved lookup table for highly efficient resource utilization[J]. Journal of Electronics & Information Technology, 2019, 41(10): 2382–2388. doi: 10.11999/JEIT190095.
    [19] GARCÍA A. Greedy algorithms: A review and open problems[J]. Journal of Inequalities and Applications, 2025, 2025(1): 11. doi: 10.1186/s13660-025-03254-1.
  • 加载中
图(7) / 表(3)
计量
  • 文章访问数:  17
  • HTML全文浏览量:  4
  • PDF下载量:  1
  • 被引次数: 0
出版历程
  • 修回日期:  2026-02-14
  • 录用日期:  2026-02-14
  • 网络出版日期:  2026-03-04

目录

    /

    返回文章
    返回