Bayesian Optimization-Driven Design Space Exploration Method for Coarse-Grained Reconfigurable Cipher Logic Array
-
摘要: 由于粗粒度可重构密码逻辑阵列(CGRCA)的设计空间规模巨大,导致设计评估耗时长,手工探索优化解的质量不高且搜索效率较低。为此,该文面向CGRCA架构的高维空间、多目标优化特性,提出了基于贝叶斯优化的多目标设计空间探索方法,在平衡吞吐量、面积和FU利用率的同时提升解的质量。首先,该方法利用知识感知的无监督学习采样策略获得初始样本,确保初始样本的代表性与多样性。其次,建立快速评估模型对样本进行量化评估,缩短评估性能的时长。再者,设计自适应的多采集函数并建立基于贪心的混合代理模型,提出多目标贝叶斯优化方法来搜索最优的CGRCA架构,提升搜索效率和通用性。实验结果表明,该文提出的设计空间探索方法较其他设计空间探索方法,与参考集的平均距离(ADRS)至多降低34.9%,超体积提升28.7%,吞吐量提升29.9%,面积减少6.0%,FU利用率提升11.6%,并且展现出优异的跨算法稳定性。
-
关键词:
- 粗粒度可重构密码逻辑阵列 /
- 设计空间探索 /
- 贝叶斯优化 /
- 随机森林 /
- 神经网络
Abstract:Objective Coarse-Grained Reconfigurable Cipher Logic Arrays (CGRCAs) are widely employed in information security systems owing to their high flexibility, strong performance, and inherent security. Design Space Exploration (DSE) plays a critical role in evaluating and optimizing the performance of cryptographic algorithms deployed on CGRCAs. However, conventional DSE approaches require extensive computation time to locate optimal solutions in multi-objective optimization problems and often yield suboptimal performance. To overcome these limitations, this study proposes a Bayesian optimization-based DSE framework, termed Multi-Objective Bayesian Optimization-based Exploration (MOBE), which enhances search efficiency and solution quality while effectively satisfying the complex design requirements of CGRCA architectures. Methods The high-dimensional characteristics and multi-objective optimization features of the CGRCA are analyzed, and its design space is systematically modeled. A DSE method based on Bayesian optimization is then proposed, comprising initial sampling design, rapid evaluation model construction, surrogate model development, and acquisition function optimization. A knowledge-aware unsupervised learning sampling strategy is introduced to integrate domain-specific knowledge with clustering algorithms, thereby improving the representativeness and diversity of the initial samples. A rapid evaluation model is established to estimate throughput, area overhead, and Function Unit (FU) utilization for each sample, effectively reducing the computational cost of performance evaluation. To enhance both search efficiency and generalizability, a greedy-based hybrid surrogate model is constructed by combining Gaussian Process with Deep Kernel Learning (DKL-GP), random forest, and neural network models. Moreover, an adaptive multi-acquisition function is designed by integrating Expected Hyper Volume Improvement (EHVI) and quasi-Monte Carlo Upper Confidence Bound (qUCB) to identify the most promising samples and maintain a balanced trade-off between exploration and exploitation. The weighting ratio between EHVI and qUCB is dynamically adjusted to accommodate the varying optimization requirements across different search phases. Results and Discussions The DSE method based on Bayesian optimization (Algorithm 2) includes initial sampling design, rapid evaluation model construction, surrogate model development, and acquisition function optimization to enhance solution quality and search efficiency. Simulation results show that the knowledge-aware unsupervised learning sampling strategy reduces the Average Distance from Reference Set (ADRS) by up to 28.2% and increases hypervolume by 15.1% compared with existing sampling approaches ( Table 3 ). This improvement primarily arises from the integration of domain knowledge with clustering algorithms. Compared with single surrogate model–based DSE methods, the greedy-based hybrid surrogate model leverages the complementary advantages of multiple surrogate models across different optimization stages, prioritizing samples that contribute most to hypervolume expansion. The hybrid surrogate model achieves a reduction in ADRS of up to 31.7% and an improvement in hypervolume of 20.0% (Table 4 ). Furthermore, the proposed MOBE framework achieves a 34.9% reduction in ADRS and increases hypervolume by 28.7% relative to state-of-the-art DSE methods (Table 5 ). Regarding the average performance metrics of Pareto-front samples, MOBE enhances throughput by up to 29.9%, reduces area overhead by 6.0%, and improves FU utilization by 11.6% (Fig. 6 ), confirming its superiority in overall solution quality. Moreover, the MOBE method exhibits excellent cross-algorithm stability in both hypervolume and Normalized Overall Execution Time (NOET) (Fig. 7 andTable 6 ).Conclusions This study presents a multi-objective DSE method based on Bayesian optimization that enhances both solution quality and search efficiency for CGRCA. The proposed approach employs a knowledge-aware unsupervised learning sampling strategy to generate an initial sample set with high representativeness and diversity. A rapid evaluation model is subsequently developed to reduce the computational cost of performance assessments. Additionally, the integration of adaptive multi-acquisition functions with a greedy-based hybrid surrogate model further improves the efficiency and generalization capability of the DSE framework. Comparative experiments demonstrate the effectiveness of the proposed MOBE method: (1) the sampling strategy reduces the ADRS by up to 28.2% and increases hypervolume by 15.1% compared with existing methods; (2) the greedy-based hybrid surrogate model achieves up to a 31.7% reduction in ADRS and a 20.0% improvement in hypervolume relative to single surrogate model–based approaches; (3) the overall MOBE framework achieves a 34.9% reduction in ADRS and a 28.7% increase in hypervolume compared with state-of-the-art DSE techniques; (4) MOBE improves throughput by up to 29.9%, reduces area overhead by 6.0%, and increases FU utilization by 11.6% relative to existing methods; and (5) MOBE exhibits excellent cross-algorithm stability in hypervolume and NOET. MOBE is applicable to medium-and-high-performance cryptographic application scenarios, including cloud platforms and desktop terminals. Nevertheless, two limitations remain. First, MOBE currently employs only traditional surrogate models, which may constrain feature learning efficiency and modeling accuracy. Second, its validation is confined to a CGRCA architecture previously developed by the research group, lacking verification across existing CGRCA architectures. Future work will address these limitations by incorporating emerging artificial intelligence techniques, such as large models, and conducting extensive experiments on diverse CGRCA architectures to further enhance the generalization and effectiveness of MOBE. -
表 1 CGRCA设计参数
参数 符号 层次 取值 可重构处理级数量 r CGRA 1~32 可重构处理级内PE数量 c CGRA 4~8 PE内逻辑单元数量 FU1 处理单元 1~4 PE内模加单元数量 FU2 处理单元 1~4 PE内模乘单元数量 FU3 处理单元 1~4 PE内移位单元数量 FU4 处理单元 1~4 PE内置换单元数量 FU5 处理单元 1~4 PE内有限域乘法单元数量 FU6 处理单元 1~4 前向跨级互连网络位宽 K1 全局互连 1~4 后向反馈互连网络位宽 K2 全局互连 1~4 前向跨级互连网络跨级长度 P1 全局互连 4~32 后向反馈互连网络跨级长度 P2 全局互连 4~32 存储器数量 MN 存储器 4~16 1 知识感知的无监督学习采样策略的算法描述
输入: 设计空间 D;初始样本数量N 输出:初始样本集 X (1) X ← $\varnothing $; (2) T ← Halton(D, N);//构建候选样本集 (3) l, LR ← Hierarchical_Cluster(T, l_max, weight);//计算子层数量l、子层中可重构处理级取值范围集LR (4) LSN ← NPS(l, LR);//计算所有子层样本簇的集LSN (5) for i ← 1 to l do (6) LSi ← LSi$ \cup $Halton(lsni, lri, Len(lsni));//计算子层i候选样本集LSi (7) pi, Ci ← EC_Kmeans(LSi, cn_max);//计算子层i中簇的数量pi、子层i中所有簇的集合Ci (8) for j ← 1 to pi do (9) $x_{ij}^*$← Centroid(cij); //选择聚类的质心作为候选样本 (10) end for (11) while not converged do (12) for j ← 1 to pi do (13) for all x $ \in $cij do (14) R(x) ← $\dfrac{1}{{|{c_{ij}}| - 1}} \times \displaystyle\sum\nolimits_{{x^{'}} \in {c_{ij}}} {||x - {x^{'}}||} $;//评估代表性 (15) D(x) ← $ \mathop {{\text{min}}}\nolimits_{{x^*} \in \{ x_{in}^*\} _{n = 1}^{{p_i}}\backslash \{ x_{ij}^*\} } ||x - {x^*}|| $;//评估多样性 (16) end for (17) xij ← $ \arg \;{\max _{x \in {c_{ij}}}}[D(x) - R(x)] $; (18) $\{ x_{in}^*\} _{n = 1}^{{p_i}}$←$\{ x_{in}^*\} _{n = 1}^{{p_i}}\; \cup \;\{ {x_{ij}}\} \backslash \{ x_{ij}^*\} $; (19) end for (20) end while (21) return X =$\{ \{ x_{mn}^*\} _{n = 1}^{{p_i}}\} _{m = 1}^l$ 2 MOBE算法描述
输入:设计空间D;初始样本数量N;迭代次数 M 输出: 帕累托最优集P;最优解P* (1) X ← Ini_Sampling(D, N);//初始采样 (2) Y ← Evaluation(X);//评估性能 (3) D ← D \ X; (4) Q ← (X, Y); (5) Initialize surrogate models; (6) HV ←$\varnothing $;//初始化超体积 (7) for i ← 1 to M do (8) C ← Halton(D, m);//均匀随机采样m个样本作为候选样
本集(9) x1i ← arg max(MAcq(C, M1)); //选择DKL-GP模型对应
采集函数值最大的样本(10) x2i ← arg max(MAcq(C, M2));//选择随机森林模型对应
采集函数值最大的样本(11) x3i ← arg max(MAcq(C, M3)); //选择神经网络模型对应
采集函数值最大的样本(12) $x_i^*$← arg max(MAcq(x1i, x2i, x3i);//选择本轮迭代最优样
本(13) $y_i^*$← Evaluation($x_i^*$);//评估性能 (14) Q ← Q$ \cup ${$x_i^*,y_i^*$}; (15) D ← D \$x_i^*$; (16) HV ← HV$ \cup $Cal_HV(Q);//更新超体积 (17) end for (18) P ← Pareto(Q);//计算帕累托最优集 (19)P* ← Max_TH(P);//选择吞吐量最大的作为最优解 (20) return Pareto-optimal set P and optimal solution P* 表 2 设计参数设置
设计参数编号 r c FU1 FU2 FU3 FU4 FU5 FU6 K1 K2 P1 P2 MN 1 2 4 1 2 1 3 1 1 1 4 5 18 16 2 5 5 2 1 1 1 4 1 2 2 8 4 8 3 8 8 1 4 2 3 1 2 2 1 13 24 4 4 10 8 1 3 1 2 1 4 4 2 32 14 9 5 15 6 4 1 3 2 1 2 4 4 2 30 12 6 18 4 1 1 4 1 2 1 1 2 15 16 7 7 20 8 2 2 1 2 1 1 2 1 18 9 15 8 24 7 1 1 1 1 4 3 3 1 28 25 5 9 28 4 3 2 1 3 1 2 2 3 12 6 10 10 32 4 2 1 1 2 2 2 2 2 16 16 4 表 3 不同采样策略的实验结果
采样算法 ADRS 超体积 NOET MOBE-RS 0.039 0.557 1.000 MOBE-MS 0.035 0.573 0.984 MOBE-US 0.032 0.595 0.980 MOBE 0.028 0.641 0.934 表 4 不同代理模型的实验结果
代理模型 ADRS 超体积 NOET MOBE-RF 0.041 0.577 0.443 MOBE-GP 0.034 0.534 0.788 MOBE-NN 0.031 0.541 0.326 MOBE 0.028 0.641 0.934 表 6 不同DSE方法的多个指标的CV比较(%)
DSE方法 超体积CV ADRS CV NOET CV MOBE 10.29 22.56 6.74 AUGER 12.68 19.33 7.27 BOOM-Explorer 12.38 16.25 7.14 MOBE-NN 17.16 22.64 4.21 MOBE-GP 16.31 21.57 5.31 MOBE-RF 12.03 33.03 16.33 MOBE-US 14.90 24.97 8.25 MOBE-MS 14.28 24.46 8.38 MOBE-RS 18.36 19.39 7.88 表 5 MOBE、BOOM-Explorer和AUGER的实验结果
DSE方法 ADRS 超体积 NOET BOOM-Explorer 0.043 0.498 0.708 AUGER 0.038 0.538 0.927 MOBE 0.028 0.641 0.934 -
[1] DESHWAL A, JAYAKODI N K, JOARDAR B K, et al. MOOS: A multi-objective design space exploration and optimization framework for NoC enabled manycore systems[J]. ACM Transactions on Embedded Computing Systems (TECS), 2019, 18(5s): 77. doi: 10.1145/3358206. [2] KIRKPATRICK S, GELATT JR C D, and VECCHI M P. Optimization by simulated annealing[J]. Science, 1983, 220(4598): 671–680. doi: 10.1126/science.220.4598.671. [3] DEB K, PRATAP A, AGARWAL S, et al. A fast and elitist multiobjective genetic algorithm: NSGA-II[J]. IEEE Transactions on Evolutionary Computation, 2002, 6(2): 182–197. doi: 10.1109/4235.996017. [4] ZHANG Qingfu and LI Hui. MOEA/D: A multiobjective evolutionary algorithm based on decomposition[J]. IEEE Transactions on Evolutionary Computation, 2007, 11(6): 712–731. doi: 10.1109/TEVC.2007.892759. [5] WENG Jian, LIU Sihao, DADU V, et al. DSAGEN: Synthesizing programmable spatial accelerators[C]. 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture, Valencia, Spain, 2020: 268–281. doi: 10.1109/ISCA45697.2020.00032. [6] TAN Cheng, XIE Chenhao, LI Ang, et al. AURORA: Automated refinement of coarse-grained reconfigurable accelerators[C]. 2021 Design, Automation & Test in Europe Conference & Exhibition, Grenoble, France, 2021: 1388–1393. doi: 10.23919/DATE51398.2021.9473955. [7] BANDARA T K, WIJERATHNE D, MITRA T, et al. REVAMP: A systematic framework for heterogeneous CGRA realization[C]. Proceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Lausanne, Switzerland, 2022: 918–932. doi: 10.1145/3503222.3507772. [8] JOARDAR B K, KIM R G, DOPPA J R, et al. Learning-based application-agnostic 3D NoC design for heterogeneous manycore systems[J]. IEEE Transactions on Computers, 2019, 68(6): 852–866. doi: 10.1109/TC.2018.2889053. [9] QI Sirui, LI Yingheng, PASRICHA S, et al. MOELA: A multi-objective evolutionary/learning design space exploration framework for 3D heterogeneous manycore platforms[C]. 2023 Design, Automation & Test in Europe Conference & Exhibition, Antwerp, Belgium, 2023: 1–6. doi: 10.23919/DATE56975.2023.10137276. [10] KIM R G, DOPPA J R, and PANDE P P. Machine learning for design space exploration and optimization of manycore systems[C]. 2018 IEEE/ACM International Conference on Computer-Aided Design, San Diego, USA, 2018: 1–6. doi: 10.1145/3240765.3243483. [11] LOPES A S B and PEREIRA M M. A machine learning approach to accelerating DSE of reconfigurable accelerator systems[C]. 2020 33rd Symposium on Integrated Circuits and Systems Design, Campinas, Brazil, 2020: 1–6. doi: 10.1109/SBCCI50935.2020.9189899. [12] LI Jingyuan, QIU Yunhui, ZHU Guowei, et al. THRAM: A template-based heterogeneous CGRA modeling framework supporting fast DSE[C]. 2023 IEEE International Symposium on Circuits and Systems, Monterey, USA, 2023: 1–5. doi: 10.1109/ISCAS46773.2023.10182204. [13] PENG Bingbing, SUN Shaoyang, DAI Yuan, et al. PRAD: A Bayesian optimization-based DSE framework for parameterized reconfigurable architecture design[C]. 2023 IEEE 31st Annual International Symposium on Field-Programmable Custom Computing Machines, Marina Del Rey, USA, 2023: 226–226. doi: 10.1109/FCCM57271.2023.00054. [14] KUANG Huizhen, ZHENG Su, and WANG Lingli. Automated design space exploration of coarse-grained reconfigurable architecture via Bayesian optimization[C]. 2022 IEEE 16th International Conference on Solid-State & Integrated Circuit Technology, Nangjing, China, 2022: 1–3. doi: 10.1109/ICSICT55466.2022.9963336. [15] DAI Yuan, LI Jingyuan, ZHU Qilong, et al. HETA: A heterogeneous temporal CGRA modeling and design space exploration via Bayesian optimization[J]. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 2024, 32(3): 505–518. doi: 10.1109/TVLSI.2023.3344536. [16] BAI Chen, SUN Qi, ZHAI Jianwang, et al. BOOM-Explorer: RISC-V BOOM microarchitecture design space exploration framework[C]. 2021 IEEE/ACM International Conference on Computer Aided Design, Munich, Germany, 2021: 1–9. doi: 10.1109/ICCAD51958.2021.9643455. [17] LI Jingyuan, HU Yihan, DAI Yuan, et al. AUGER: A multi-objective design space exploration framework for CGRAs[C]. 2023 International Conference on Field Programmable Technology, Yokohama, Japan, 2023: 88–95. doi: 10.1109/ICFPT59805.2023.00015. [18] MENG Pingfan, ALTHOFF A, GAUTIER Q, et al. Adaptive threshold non-Pareto elimination: Re-thinking machine learning for system level design space exploration on FPGAs[C]. 2016 Design, Automation & Test in Europe Conference & Exhibition, Dresden, Germany, 2016: 918–923. [19] KIM Y, MAHAPATRA R N, and CHOI K. Design space exploration for efficient resource utilization in coarse-grained reconfigurable architecture[J]. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 2010, 18(10): 1471–1482. doi: 10.1109/TVLSI.2009.2025280. [20] CHEN Sichao, MAO Yiqing, DAI Yuan, et al. FCE: A fast CGRA architecture exploration framework[C]. 2024 IEEE 17th International Conference on Solid-State & Integrated Circuit Technology, Zhuhai, China, 2024: 1–3. doi: 10.1109/ICSICT62049.2024.10832017. [21] 王铎, 刘景磊, 严明玉, 等. 面向处理器微架构设计空间探索的加速方法综述[J]. 计算机研究与发展, 2025, 62(1): 22–57. doi: 10.7544/issn1000-1239.202330348.WANG Duo, LIU Jinglei, YAN Mingyu, et al. Acceleration methods for processor microarchitecture design space exploration: A survey[J]. Journal of Computer Research and Development, 2025, 62(1): 22–57. doi: 10.7544/issn1000-1239.202330348. -
下载:
下载: