高级搜索

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

融合大语言模型与强化学习的敏捷卫星任务分配算法设计

陈盈果 王斐然 胡云鹏 杨斌 严冰

陈盈果, 王斐然, 胡云鹏, 杨斌, 严冰. 融合大语言模型与强化学习的敏捷卫星任务分配算法设计[J]. 电子与信息学报. doi: 10.11999/JEIT250991
引用本文: 陈盈果, 王斐然, 胡云鹏, 杨斌, 严冰. 融合大语言模型与强化学习的敏捷卫星任务分配算法设计[J]. 电子与信息学报. doi: 10.11999/JEIT250991
CHEN Yingguo, WANG Feiran, HU Yunpeng, YANG Bin, YAN Bing. Automating Algorithm Design for Agile Satellite Task Assignment with Large Language Models and Reinforcement Learning[J]. Journal of Electronics & Information Technology. doi: 10.11999/JEIT250991
Citation: CHEN Yingguo, WANG Feiran, HU Yunpeng, YANG Bin, YAN Bing. Automating Algorithm Design for Agile Satellite Task Assignment with Large Language Models and Reinforcement Learning[J]. Journal of Electronics & Information Technology. doi: 10.11999/JEIT250991

融合大语言模型与强化学习的敏捷卫星任务分配算法设计

doi: 10.11999/JEIT250991 cstr: 32379.14.JEIT250991
基金项目: 国家自然科学基金(U23B2039)
详细信息
    作者简介:

    陈盈果:男,博士,副教授,研究方向为卫星任务规划

    王斐然:男,博士生,讲师,研究方向为卫星任务规划

    胡云鹏:男,博士,工程师,研究方向为卫星任务规划

    杨斌:男,硕士,正高级工程师,研究方向为卫星任务规划

    严冰:男,博士,工程师,研究方向为卫星任务规划

    通讯作者:

    王斐然 wfrlyy@nudt.edu.cn

  • 中图分类号: TP301.6

Automating Algorithm Design for Agile Satellite Task Assignment with Large Language Models and Reinforcement Learning

Funds: The National Natural Science Foundation of China (U23B2039)
  • 摘要: 多敏捷对地观测卫星任务调度(MAEOSMSP)是一个复杂的NP-难问题,其求解算法的设计长期受限于对专家经验的依赖和场景适应性差的瓶颈。为突破这一局限,该文提出一种创新的自适应算法设计(AAD)框架,通过深度融合大语言模型(LLM)与强化学习(RL),实现调度算法的自动化生成与智能应用。框架的核心是一个离线进化-在线决策架构:离线阶段,利用LLM驱动的进化计算自动生成超越人类范式的分配算法,构建强大算法库;在线阶段,通过RL智能体根据实时求解状态,从算法库中动态选择最优算子,实现对新场景的强大泛化能力。在多个标准算例上的实验表明,AAD框架生成的任务分配算法在性能上全面超越了传统专家设计算法,尤其在复杂场景下性能提升高达9.8%。该研究证实了LLM与RL协同在自动化算法设计上的巨大潜力,为MAEOSMSP提供了一种高效、自适应的求解新范式。
  • 图  1  融合大语言模型与强化学习的敏捷卫星任务分配算法设计框架示意图

    图  2  种群初始化提示词

    图  3  个体交叉和变异提示词

    图  4  各算法相较于AAD的相对性能差距示意图

    图  5  AAD在不同问题规模(任务数-卫星数)下的性能优势热力图

    图  6  所有算法的性能值分布

    图  7  场景300下,AAD产生的最优分配算子的代码

    1  自适应分配算法设计(AAD)框架

     输入:MAEOSMSP训练场景$ \mathcal{S}_{\mathrm{train}} $,LLM-EC进化代数
     $ G\mathrm{_{EC}} $,种群大小$ N\mathrm{_{pop}} $,交叉概率$ p\mathrm{_c\mathrm{ }} $,变异概率$ p\mathrm{_m} $,子代规模
     $ N\mathrm{_o} $,学习率$ \alpha $,折扣因子$ \gamma $,训练迭代次数$ E\mathrm{_{max}} $
     输出:最优任务分配算法库$ \mathcal{A} $,训练好的RL策略$ {Q}^{\mathrm{*}} $
     /* 第一阶段:基于多场景进化的离线算法库构建 */
     1 初始化算法库$ \mathcal{A}\leftarrow \mathcal{\varnothing } $
     2 for 每个场景$ S\mathit{\mathrm{_{\mathit{\mathrm{\mathrm{k\mathrm{ }}}}}}}\in\mathcal{S}_{\mathrm{train}} $ do:
     3  初始化:使用LLM生成针对场景$ S\mathrm{_{\mathit{\mathrm{k}}}} $的初始种群$ \mathrm{Pop} $
     4  for $ g=1 $ to $ G\mathrm{_{EC}} $ do
     5   评估$ \mathrm{Pop} $中所有个体的适应度(在场景$ S\mathrm{_k} $下的收益)
     6   通过轮盘赌方法选择父代个体
     7   交叉与变异:
     8   for 每一对父代 do
     9    if $ \text{Random} < p\mathrm{_c} $ then
     10     通过LLM对父代代码进行交叉,生成子代
     11    else
     12     直接保留父代
     13    end if
     14    end for
     15   for 每个子代 do
     16    if $ \text{Random} < p\mathrm{_c} $ then
     17     通过LLM变异提示对代码进行变异
     18    end if
     19   end for
     20   更新种群$ \mathrm{\mathrm{Pop}\mathrm{ }} $,保留最优个体
     21   end for
     22   $ \pi_{\mathrm{\mathit{k}}}^{\mathrm{*}}\leftarrow $从$ \mathrm{Pop} $中选取的最优算法
     23   $ \mathcal{A}\leftarrow \mathcal{A}\cup \{\pi _{k}^{\mathrm{*}}\} $
     24 end for
       /* 第二阶段:基于求解状态的RL策略训练 */
     25 初始化Q表$ Q(\boldsymbol{s},a)\leftarrow 0 $
     26 for episode $ e=1 $ to $ E\mathrm{_{max}} $ do
     27   随机选择场景$ S\in\mathcal{S}\mathrm{_{train}} $,初始化解$ {\boldsymbol{x}}_{0} $
     28   初始化求解状态$ {\boldsymbol{s}}_{0} $
     29   while 未满足终止条件 do
     30    动作选择:基于当前状态$ {\boldsymbol{s}}_{t} $,使用$\epsilon $-greedy策略从库
     $ \mathcal{A} $中选择算法$ a\mathit{\mathrm{_{\mathit{\mathit{t\mathit{\mathit{ }}}}}}\mathit{\mathit{ }}} $
     31    执行:将算法$ a\mathrm{_{\mathit{t}}} $应用于当前解$ {\boldsymbol{x}}_{t} $,通过单星调度得到
     新解$ {\boldsymbol{x}}_{t+1} $
     32    奖励计算:计算收益$ {r}_{t}=f({\boldsymbol{x}}_{t+1})-f({\boldsymbol{x}}_{t}) $及额外激励
     33    状态观测:观测新的求解状态$ {\boldsymbol{s}}_{t+1} $
     34    Q值更新:$Q({\boldsymbol{s}}_{t},{a}_{t})\leftarrow Q({\boldsymbol{s}}_{t},{a}_{t})+ $
         $ \alpha [{r}_{t}+\gamma \underset{a\mathrm{'}}{\max }Q({\boldsymbol{s}}_{t+1},a\mathrm{'})-Q({\boldsymbol{s}}_{t},{a}_{t})] $
     35    更新当前解与状态:$ {\boldsymbol{x}}_{t}\leftarrow {\boldsymbol{x}}_{t+1},{\boldsymbol{s}}_{t}\leftarrow {\boldsymbol{s}}_{t+1} $
     36   end while
     37 end for
     38 return $ \mathcal{A},{Q}^{\mathrm{*}} $
    下载: 导出CSV

    表  1  对比算法在求解MAEOSMSP上的收益结果对比

    Scen. AAD LLM-Only RL-Only DQN-QA AALNS RALNS
    Profit Profit Gap Profit Gap Profit Gap Profit Gap Profit Gap
    300-2 1405.8 1398.5 0.5 1316.8 6.3 1364.2 3.0 1407.2 –0.1 1407.2 –0.1
    400-2 1693 1669.8 1.4 1687.5 0.3 1647.9 2.7 1655.9 2.2 1655.9 2.2
    400-3 1969.9 1924.8 2.3 1808.8 8.2 1810.4 8.1 1890.5 4.0 1868.5 5.1
    500-2 1836.9 1852.1 –0.8 1782 3.0 1762.4 4.1 1797.7 2.1 1786.4 2.8
    500-3 2388.2 2364.5 1.0 2273.1 4.8 2253.8 5.6 2286.1 4.3 2262 5.3
    500-4 2586.2 2570.6 0.6 2408 6.9 2408 6.9 2390.2 7.6 2375.6 8.1
    600-2 2004.7 2007.8 –0.2 1961.5 2.2 1944.5 3.0 1999 0.3 1980.7 1.2
    600-3 2569.4 2555.5 0.5 2532.8 1.4 2522.3 1.8 2422.4 5.7 2439.1 5.1
    600-4 2989.4 2960.1 1.0 2926.1 2.1 2895.1 3.2 2804.8 6.2 2762.6 7.6
    600-5 3067.2 3034.2 1.1 2951.1 3.8 2936.7 4.3 2898.2 5.5 2813.4 8.3
    700-2 2260 2263.2 –0.1 2217.1 1.9 2234.3 1.1 2214.3 2.0 2232.6 1.2
    700-3 2762.4 2730.7 1.1 2783.9 –0.8 2790.9 –1.0 2626.7 4.9 2656.4 3.8
    700-4 3246.3 3225 0.7 3203.1 1.3 3189.3 1.8 3062.8 5.7 3025.1 6.8
    700-5 3646.3 3640.1 0.2 3548.6 2.7 3544.5 2.8 3405.1 6.6 3301.6 9.5
    700-6 3865.6 3839.5 0.7 3815.2 1.3 3795.6 1.8 3766.5 2.6 3487.5 9.8
    下载: 导出CSV

    表  2  AAD框架在不同规模场景下的离线训练时间开销(h)

    场景规模算法库构建策略训练总离线训练时间
    300-2~ 1.5~ 0.3~ 1.8
    500-3~ 2.5~ 0.5~ 3.0
    700-6~ 4.0~ 1.0~ 5.0
    下载: 导出CSV

    表  3  演化分配算子的多标准评估方法概述

    评估标准 具体描述 关注领域 专家对比
    优先级与可见性综合评分 结合任务的内在优先级、对特定卫星的总有效可见时长,并根据任务执行时长进行归一化,形成基础价值分数:$ {S}_{\text{base}}(i,j)=\dfrac{{P}_{i}\cdot {V}_{ij}}{{D}_{i}+\epsilon } $ 任务价值与机会 融合多种单一规则,
    实现综合量化评估
    姿态适宜性
    评分
    引入复杂的姿态成本函数,惩罚超出理想范围的姿态角,并结合姿态剖面的统计特性(均值与标准差)来量化姿态控制的稳定性与综合成本。该评分为成本的倒数:$ {S}_{\text{att}}(i,j)=\dfrac{1}{{C}_{\text{att}}(i,j)+\epsilon } $ 卫星姿态动力学 新颖概念:超越了硬约束检查,引入统计指标来评估姿态剖面的整体质量
    动态负载
    均衡因子
    根据卫星当前的负载情况,动态调整其被分配任务的可能性。负载越高,该因子越低,从而在系统层面促进任务的均衡分配:$ {F}_{\text{load}}(j)=\mathrm{max}(0.5,1-\dfrac{{L}_{j}}{{N}_{\text{rem}}}) $ 系统资源均衡 在每次决策中实现动态均衡,而非静态或后置调整
    历史性能激励 为被成功分配任务的卫星累积一个性能因子,模拟一种“优胜劣汰”的自适应机制,使其在后续分配中更具竞争力:$ H_{j}^{(t+1)}=(1+\beta )\cdot H_{j}^{(t)} $ 卫星性能演化 新颖概念:引入自适应学习机制,打破了卫星能力固定的传统假设
    地理/时间分布均衡度 评估将任务分配给某卫星后,是否有助于该卫星任务集的时空聚集性,鼓励其对已覆盖区域或已安排任务的日期进行强化观测:$ {S}_{\mathrm{geo}/\text{temp}}(i,j)\propto \sum (\text{History}+1) $ 时空覆盖策略 新颖概念:从微观的任务满足,转向宏观的系统级时空布局优化
    下载: 导出CSV

    表  4  数学符号定义

    符号定义
    $ i,j $分别代表任务索引和卫星索引
    $ {S}_{\text{base}}(i,j) $将任务$ i $分配给卫星$ j $的基础价值分数
    $ {S}_{\text{att}}(i,j) $将任务$ i $分配给卫星$ j $的姿态适宜性分数
    $ {S}_{\text{final}}(i,j) $将任务$ i $分配给卫星$ j $的最终综合得分
    $ {P}_{i} $任务$ i $的优先级
    $ {V}_{ij} $$ j $对任务$ i $的总有效可见时长
    $ {D}_{i} $任务$ i $所需的执行时长
    $ {C}_{\text{att}}(i,j) $卫星$ j $执行任务$ i $的综合姿态成本
    $ {F}_{\text{load}}(j) $卫星$ j $的动态负载均衡因子
    $ {L}_{j} $卫星$ j $的当前负载(已分配任务数)
    $ {N}_{\text{rem}} $系统中剩余未分配的任务总数
    $ H_{j}^{(t)} $卫星$ j $在决策轮次$ t $的历史性能因子
    $ \beta $历史性能因子的增长率(常数)
    $ {S}_{\mathrm{geo}/\text{temp}}(i,j) $分配任务$ i $给卫星$ j $的时空分布均衡度分数
    $ \epsilon $一个极小的正数,用于防止分母为零
    下载: 导出CSV
  • [1] ZHAO Qiang, YU Le, DU Zhenrong, et al. An overview of the applications of Earth observation satellite data: Impacts and future trends[J]. Remote Sensing, 2022, 14(8): 1863. doi: 10.3390/rs14081863.
    [2] HE Lei, DE WEERDT M, and YORKE-SMITH N. Time/sequence-dependent scheduling: The design and evaluation of a general purpose tabu-based adaptive large neighbourhood search algorithm[J]. Journal of Intelligent Manufacturing, 2020, 31(4): 1051–1078. doi: 10.1007/s10845-019-01518-4.
    [3] WANG Xinwei, WU Guohua, XING Lining, et al. Agile Earth observation satellite scheduling over 20 years: Formulations, methods, and future directions[J]. IEEE Systems Journal, 2021, 15(3): 3881–3892. doi: 10.1109/JSYST.2020.2997050.
    [4] PENG Guansheng, SONG Guopeng, HE Yongming, et al. Solving the agile Earth observation satellite scheduling problem with time-dependent transition times[J]. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2022, 52(3): 1614–1625. doi: 10.1109/TSMC.2020.3031738.
    [5] CHO D H, KIM J H, CHOI H L, et al. Optimization-based scheduling method for agile Earth-observing satellite constellation[J]. Journal of Aerospace Information Systems, 2018, 15(11): 611–626. doi: 10.2514/1.I010620.
    [6] WU Jian, YAO Feng, SONG Yanjie, et al. Frequent pattern-based parallel search approach for time-dependent agile earth observation satellite scheduling[J]. Information Sciences, 2023, 636: 118924. doi: 10.1016/j.ins.2023.04.003.
    [7] WEI Luona, CUI Yongqiang, CHEN Ming, et al. Multi-objective neural policy approach for agile earth satellite scheduling problem considering image quality[J]. Swarm and Evolutionary Computation, 2025, 94: 101857. doi: 10.1016/j.swevo.2025.101857.
    [8] LI Lei, DU Yonghao, YAO Feng, et al. Learning memetic algorithm based on variable population and neighborhood for multi-complex target scheduling of large-scale imaging satellites[J]. Swarm and Evolutionary Computation, 2025, 92: 101789. doi: 10.1016/j.swevo.2024.101789.
    [9] 彭观胜, 宋国鹏, 刘晓路, 等. 考虑时间依赖收益特性的敏捷卫星调度问题[J]. 运筹与管理, 2024, 33(8): 1–7. doi: 10.12005/orms.2024.0243.

    PENG Guansheng, SONG Guopeng, LIU Xiaolu, et al. Agile earth observation satellite scheduling with time-dependent profits[J]. Operations Research and Management Science, 2024, 33(8): 1–7. doi: 10.12005/orms.2024.0243.
    [10] DU Yonghao, WANG Tao, XIN Bin, et al. A data-driven parallel scheduling approach for multiple agile earth observation satellites[J]. IEEE Transactions on Evolutionary Computation, 2020, 24(4): 679–693. doi: 10.1109/TEVC.2019.2934148.
    [11] WANG Xinwei, GU Yi, WU Guohua, et al. Robust scheduling for multiple agile Earth observation satellites under cloud coverage uncertainty[J]. Computers & Industrial Engineering, 2021, 156: 107292. doi: 10.1016/j.cie.2021.107292.
    [12] SUN Haiquan, XIA Wei, WANG Zhilong, et al. Agile earth observation satellite scheduling algorithm for emergency tasks based on multiple strategies[J]. Journal of Systems Science and Systems Engineering, 2021, 30(5): 626–646. doi: 10.1007/s11518-021-5506-4.
    [13] YAO Feng, CHEN Yingguo, WANG Ling, et al. A bilevel evolutionary algorithm for large-scale multiobjective task scheduling in multiagile earth observation satellite systems[J]. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2024, 54(6): 3512–3524. doi: 10.1109/TSMC.2024.3359265.
    [14] CHEN Jiawei, WANG Feiran, CHEN Yingguo, et al. A generalized bilevel optimization model for large-scale task scheduling in multiple agile earth observation satellites[J]. Knowledge-Based Systems, 2025, 309: 112809. doi: 10.1016/j.knosys.2024.112809.
    [15] HE Yongming, WU Guohua, CHEN Yingwu, et al. A two-stage framework and reinforcement learning-based optimization algorithms for complex scheduling problems[J]. arXiv preprint arXiv: 2103.05847, 2021.
    [16] FENG Xiaoen, LI Yuqing, and XU Minqiang. Multi-satellite cooperative scheduling method for large-scale tasks based on hybrid graph neural network and metaheuristic algorithm[J]. Advanced Engineering Informatics, 2024, 60: 102362. doi: 10.1016/j.aei.2024.102362.
    [17] LI Rui, WANG Ling, SANG Hongyan, et al. LLM-assisted automatic memetic algorithm for lot-streaming hybrid job shop scheduling with variable sublots[J]. IEEE Transactions on Evolutionary Computation, 2025, Early Access. doi: 10.1109/TEVC.2025.3556186.
    [18] LIU Fei, TONG Xialiang, YUAN Mingxuan, et al. Algorithm evolution using large language model[J]. arXiv preprint arXiv: 2311.15249, 2023.
    [19] LIU Fei, TONG Xialiang, YUAN Mingxuan, et al. Evolution of heuristics: Towards efficient automatic algorithm design using large language model[C]. Forty-first International Conference on Machine Learning, Vienna, Austria, 2024.
    [20] YE Haoran, WANG Jiarui, CAO Zhiguang, et al. ReEvo: Large language models as hyper-heuristics with reflective evolution[C]. The 38th International Conference on Neural Information Processing Systems, Vancouver, Canada, 2024: 1381. doi: 10.5555/3737916.3739297.
    [21] LI Hanxiao, GAO Kaizhou, DUAN Peiyong, et al. An improved artificial bee colony algorithm with Q-learning for solving permutation flow-shop scheduling problems[J]. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2023, 53(5): 2684–2693. doi: 10.1109/TSMC.2022.3219380.
    [22] QI Rui, LI Junqing, WANG Juan, et al. QMOEA: A Q-learning-based multiobjective evolutionary algorithm for solving time-dependent green vehicle routing problems with time windows[J]. Information Sciences, 2022, 608: 178–201. doi: 10.1016/j.ins.2022.06.056.
    [23] CHEN Ming, DU Yonghao, TANG Ke, et al. Learning to construct a solution for the agile satellite scheduling problem with time-dependent transition times[J]. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2024, 54(10): 5949–5963. doi: 10.1109/TSMC.2024.3411640.
    [24] GONG Yupeng, LI Anshou, and PENG Xuan. Geometrical design method of Walker constellation in non-terrestrial network[J]. Acta Astronautica, 2024, 219: 618–626. doi: 10.1016/j.actaastro.2024.03.069.
    [25] KOLAEE M H, JABBARZADEH A, and AL-E-HASHEM S M J M. Sustainable group tourist trip planning: An adaptive large neighborhood search algorithm[J]. Expert Systems with Applications, 2024, 237: 121375. doi: 10.1016/j.eswa.2023.121375.
    [26] JIA Shengyang, DENG Lei, ZHAO Quanwu, et al. An adaptive large neighborhood search heuristic for multi-commodity two-echelon vehicle routing problem with satellite synchronization[J]. Journal of Industrial and Management Optimization, 2023, 19(2): 1187–1210. doi: 10.3934/jimo.2021225.
    [27] WU Guohua, XIANG Zhiqing, WANG Yalin, et al. Improved adaptive large neighborhood search algorithm based on the two-stage framework for scheduling multiple super-agile satellites[J]. IEEE Transactions on Aerospace and Electronic Systems, 2024, 60(5): 7185–7200. doi: 10.1109/TAES.2024.3416427.
    [28] HE Lei, LIU Xiaolu, LAPORTE G, et al. An improved adaptive large neighborhood search algorithm for multiple agile satellites scheduling[J]. Computers & Operations Research, 2018, 100: 12–25. doi: 10.1016/j.cor.2018.06.020.
  • 加载中
图(7) / 表(5)
计量
  • 文章访问数:  66
  • HTML全文浏览量:  34
  • PDF下载量:  14
  • 被引次数: 0
出版历程
  • 收稿日期:  2025-09-26
  • 修回日期:  2026-01-06
  • 录用日期:  2026-01-06
  • 网络出版日期:  2026-01-10

目录

    /

    返回文章
    返回