高级搜索

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

基于深度强化学习的自适应大邻域搜索算法在成像卫星调度问题中的应用

魏普远 何磊

魏普远, 何磊. 基于深度强化学习的自适应大邻域搜索算法在成像卫星调度问题中的应用[J]. 电子与信息学报. doi: 10.11999/JEIT251009
引用本文: 魏普远, 何磊. 基于深度强化学习的自适应大邻域搜索算法在成像卫星调度问题中的应用[J]. 电子与信息学报. doi: 10.11999/JEIT251009
WEI Puyuan, HE Lei. A Deep Reinforcement Learning Enhanced Adaptive Large Neighborhood Search for Imaging Satellite Scheduling[J]. Journal of Electronics & Information Technology. doi: 10.11999/JEIT251009
Citation: WEI Puyuan, HE Lei. A Deep Reinforcement Learning Enhanced Adaptive Large Neighborhood Search for Imaging Satellite Scheduling[J]. Journal of Electronics & Information Technology. doi: 10.11999/JEIT251009

基于深度强化学习的自适应大邻域搜索算法在成像卫星调度问题中的应用

doi: 10.11999/JEIT251009 cstr: 32379.14.JEIT251009
基金项目: 国家自然科学基金(72571283),中国科协青年人才托举工程(2022QNRC001)
详细信息
    作者简介:

    魏普远:男,硕士生,研究方向为卫星调度优化

    何磊:男,副教授,研究方向为卫星调度优化

    通讯作者:

    何磊 helei@nudt.edu.cn

  • 11https://tianzhibei.com/normalizedRemoteSensingField
  • 中图分类号: TP181;V474.2

A Deep Reinforcement Learning Enhanced Adaptive Large Neighborhood Search for Imaging Satellite Scheduling

Funds: The National Natural Science Foundation of China (72571283), Young Elite Scientists Sponsorship Program (2022QNRC001)
  • 摘要: 卫星调度问题(Satellite Scheduling Problem, SSP)是典型的NP-hard组合优化问题,其目标是在满足一系列复杂物理和运行约束的条件下,最大化观测收益或完成任务数量。自适应大邻域搜索(Adaptive Large Neighborhood Search, ALNS)是求解此类问题的有效元启发式算法,其性能高度依赖于破坏与修复算子的选择策略。传统ALNS算法通常采用基于历史表现的启发式计分机制来调整算子选择概率,这种机制受参数影响较大且无法动态适应搜索过程中的复杂状态变化。为解决此问题,该文提出了一种基于深度强化学习(Deep Reinforcement Learning, DRL)的自适应大邻域搜索算法(DR-ALNS)。该算法将算子选择过程建模为一个马尔可夫决策过程,利用深度强化学习智能体在每次迭代中根据当前解的状态动态选择最合适的破坏与修复算子。通过端到端的学习,DRL智能体能够学习到一种隐式的、高效的算子选择策略,从而更好地引导搜索方向,提升算法的全局探索与局部开发能力。通过在标准的卫星调度问题测试集上进行实验,可以看出DR-ALNS算法产生的解的质量优于传统ALNS及其他算法。
  • 图  1  图中任务1和任务2的观测窗口有重叠部分,故两个任务存在冲突,卫星只能选择其中一个目标进行观测

    图  2  DR-ALNS框架图

    图  3  DR-ALNS训练曲线

    图  4  实验得分

    表  1  符号说明

    符号 描述
    $ \text{sa{t}}_{j} $ 卫星集合中的第$ j $个卫星
    $ {O}_{ij} $ 卫星$ \text{sa{t}}_{j} $可以观测到任务$ {m}_{i} $的窗口集合
    $ {O}_{ijk} $ $ {O}_{ij} $中的第$ k $个时间窗
    $ \text{s{t}}_{ijk} $ $ {O}_{ijk} $的开始时间
    $ \text{e{t}}_{ijk} $ $ {O}_{ijk} $的结束时间
    $ \text{du{r}}_{i} $ 任务$ {m}_{i} $的持续观测时间
    $ \text{pri}_{i} $ 任务$ {m}_{i} $的收益值
    $ \text{tran}{s}_{i{{i}^{\prime}}} $ 卫星从一个任务$ {m}_{i} $转换到任务$ {m}_{{{i}^{\prime}}} $所需的转换时间
    $ {x}_{ijk} $ 用来指示$ {m}_{i} $是否被执行,值为1时表示被执行,
    为0时表示未被执行
    $ X $ 时间窗的决策空间
    $ \text{s{t}}_{i} $ 执行任务$ {m}_{i} $的开始时间
    $ \text{e{t}}_{i} $ 执行任务$ {m}_{i} $的结束时间
    $ \text{SE} $ 计划执行任务集合
    下载: 导出CSV

    1  DR-ALNS

     输入:步骤数M、策略、破坏算子集合、修复算子集合、问题实例
     初始化:最优解$ x\mathrm{_{best}} $=初始解$ x $,初始状态$ s_t=\text{initial} $$ \mathrm{state} $
     (1) while 停止准则未满足 do
     (2)  for t=0 to M-1 do
     (3)   基于当前状态$ {s}_{t} $通过策略$ {\pi }_{\theta } $选择动作$ {a}_{t} $
     (4)   基于动作$ {a}_{t} $,选择破坏算子$ d $和修复算子$ r $,并配置破坏
        强度与接受准则
     (5)   $ {x}_{t}=r(d(x)) $
     (6)   if $ \mathrm{\text{accept}}(x_t,x) $then
     (7)    $ x={x}_{t} $
     (8)   if 新解成本更低then
     (9)    $ x\mathrm{_\text{best}}=x_t $
     (10)   更新状态$ {s}_{t} $并获取奖励$ {R}_{t} $
     (11) (训练模式下)更新策略$ {\pi }_{\theta } $
     (12) return (部署模式下)$ x\mathrm{_{best}} $
    下载: 导出CSV

    表  2  DR-ALNS的状态空间特征

    特征描述
    current_score这是当前解的得分
    best_score算法从开始到目前为止找到的
    历史最优解的得分
    improvement_percentage表示当前解相对于历史最优解的
    提升百分比
    iterations_since_best自上一次找到新的最优解以来,
    已经过去了多少次迭代
    current_iter当前的迭代次数
    temperature模拟退火接受准则中当前的温度值
    下载: 导出CSV

    表  3  PPO参数设置

    参数取值
    n_steps2 048
    batch_size64
    n_epochs10
    gamma0.99
    gae_lambda0.95
    clip_range0.2
    learning_rate0.0003
    vf_coef0.5
    max_grad_norm0.5
    下载: 导出CSV

    表  4  得分分析

    算法小场景平均得分中场景平均得分大场景平均得分
    DR-ALNS252.18845.591979.73
    ALNS250.62820.761870.65
    A-ALNS250.54826.251904.31
    GRILS250.66791.681730.04
    下载: 导出CSV

    表  5  各场景中获得最优解迭代次数和运行时间

    算法小场景中场景大场景
    迭代次数运行时间(s)迭代次数运行时间(s)迭代次数运行时间(s)
    DR-ALNS120.63.687202.57.548661.816.845
    ALNS158.64.405461.512.365178732.684
    A-ALNS142.84.318350.510.6251368.430.685
    GRILS128.14.139384.910.9831108.629.584
    下载: 导出CSV
  • [1] KASLOW D. Planning and scheduling of earth observing satellites[C]. 2007 IEEE Aerospace Conference, Big Sky, MT, USA, 2007: 1-12.DOI: 10.1109/AERO.2007.352824.
    [2] CHEN Ming, DU Yonghao, TANG Ke, et al. Learning to construct a solution for the agile satellite scheduling problem with time-dependent transition times[J]. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2024, 54(10): 5949–5963. doi: 10.1109/TSMC.2024.3411640.
    [3] LIU Quanyong, PENG Jiangtao, ZHANG Genwei, et al. Deep contrastive learning network for small-sample hyperspectral image classification[J]. Journal of Remote Sensing, 2023, 3: 0025. doi: 10.34133/remotesensing.0025.
    [4] HAN Zonghao, ZHANG Ziye, ZHANG Shun, et al. Aerial visible-to-infrared image translation: Dataset, evaluation, and baseline[J]. Journal of Remote Sensing, 2023, 3: 0096. doi: 10.34133/remotesensing.0096.
    [5] MEI Shaohui, LIAN Jiawei, WANG Xiaofei, et al. A comprehensive study on the robustness of deep learning-based image classification and object detection in remote sensing: Surveying and benchmarking[J]. Journal of Remote Sensing, 2024, 4: 0219. doi: 10.34133/remotesensing.0219.
    [6] KRIGMAN S, GRINSHPOUN T, and DERY L. Scheduling of Earth observing satellites using distributed constraint optimization[J]. Journal of Scheduling, 2024, 27(5): 507–524. doi: 10.1007/s10951-024-00816-x.
    [7] WU Jian, CHEN Yuning, HE Yongming, et al. Survey on autonomous task scheduling technology for Earth observation satellites[J]. Journal of Systems Engineering and Electronics, 2022, 33(6): 1176–1189. doi: 10.23919/JSEE. 2022.000141.
    [8] FENG Xiaoen, LI Yuqing, and XU Minqiang.Multi-satellite cooperative scheduling method for large-scaletasks based on hybrid graph neural network andmetaheuristic algorithm[J]. Advanced EngineeringInformatics, 2024, 60: 102362. doi: 10.1016/j.aei.2024.102362.
    [9] PISINGER D and ROPKE S. A general heuristic for vehicle routing problems[J]. Computers & Operations Research, 2007, 34(8): 2403–2435. doi: 10.1016/j.cor.2005.09.012.
    [10] DOERR C, DREO J, and KERSCHKE P. Making a case for (Hyper-)parameter tuning as benchmark problems[C]. The Genetic and Evolutionary Computation Conference Companion, Prague, Czech Republic, 2019: 1755–1764. doi: 10.1145/3319619.3326857.
    [11] SANTINI A, ROPKE S, and HVATTUM L M. A comparison of acceptance criteria for the adaptive large neighbourhood search metaheuristic[J]. Journal of Heuristics, 2018, 24(5): 783–815. doi: 10.1007/s10732-018-9377-x.
    [12] WU Guohua, XIANG Zhiqing, WANG Yalin, et al. Improved adaptive large neighborhood search algorithm based on the two-stage framework for scheduling multiple super-agile satellites[J]. IEEE Transactions on Aerospace and Electronic Systems, 2024, 60(5): 7185–7200. doi: 10.1109/TAES.2024.3416427.
    [13] BENAVIDES-ROBLES M T, VALENCIA-RIVERA G H, CRUZ-DUARTE J M, et al. Robotic mobile fulfillment system: A systematic review[J]. IEEE Access, 2024, 12: 16767–16782. doi: 10.1109/ACCESS.2024.3359434.
    [14] SHARIAT R and HUANG Kai. A large-scale neighborhood search algorithm for multi-activity tour scheduling problems[J]. Journal of Heuristics, 2024, 30(5): 225–267. doi: 10.1007/s10732-024-09527-0.
    [15] BOGYRBAYEVA A, MERALIYEV M, MUSTAKHOV T, et al. Machine learning to solve vehicle routing problems: A survey[J]. IEEE Transactions on Intelligent Transportation Systems, 2024, 25(6): 4754–4772. doi: 10.1109/TITS.2023.3334976.
    [16] FENG Guosheng, WANG Hongzhi, and WANG Chunnan. Search for deep graph neural networks[J]. Information Sciences, 2023, 649: 119617. doi: 10.1016/j.ins.2023.119617.
    [17] WANG Chao, CAO Mingjiu, JIANG Hang, et al. A deep reinforcement learning-based adaptive large neighborhood search for capacitated electric vehicle routing problems[J]. IEEE Transactions on Emerging Topics in Computational Intelligence, 2024, 8(5): 3907–3919. doi: 10.1109/TETCI.2024.3364966.
    [18] LAND A H and DOIG A G. An automatic method of solving discrete programming problems[J]. Econometrica, 1960, 28(3): 497–520. doi: 10.2307/1910129.
    [19] BELLMAN R and KALABA R. Dynamic Programming and Modern Control Theory[M]. New York, USA: Academic Press, 1965: 1–112.
    [20] ANTUORI V, WOJTOWICZ D T, and HEBRARD E. Solving the agile Earth observation satellite scheduling problem with CP and local search[C]. 31st International Conference on Principles and Practice of Constraint Programming, Glasgow, UK, 2025: 3: 1–3: 22. doi: 10.4230/LIPIcs.CP.2025.3.
    [21] 赵静, 赵卫虎, 李勇军, 等. 基于改进NSGA-Ⅱ算法的微波/光混合链路中继卫星多目标资源调度算法[J]. 中国激光, 2013, 40(12): 1205003. doi: 10.3788/CJL201340.1205003.

    ZHAO Jing, ZHAO Weihu, LI Yongjun, et al. Multi-objective resources scheduling algorithm for microwave and laser hybrid links data relay satellite based on improved NSGA-Ⅱ algorithm[J]. Chinese Journal of Lasers, 2013, 40(12): 1205003. doi: 10.3788/CJL201340.1205003.
    [22] KANDEPI R, SAINI H, GEORGE R K, et al. Agile earth observation satellite constellations scheduling for large area target imaging using heuristic search[J]. Acta Astronautica, 2024, 219: 670–677. doi: 10.1016/j.actaastro.2024.03.027.
    [23] WU Jian, SONG Bingyu, ZHANG Guoting, et al. A data-driven improved genetic algorithm for agile earth observation satellite scheduling with time-dependent transition time[J]. Computers & Industrial Engineering, 2022, 174: 108823. doi: 10.1016/j.cie.2022.108823.
    [24] HAN Chao, GU Yan, WU Guohua, et al. Simulated annealing-based heuristic for multiple agile satellites scheduling under cloud coverage uncertainty[J]. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2022, 53(5): 2863-2874.DOI: 10.1109/TSMC.2022.3218579.
    [25] HE Lei, LIU Xiaolu, LAPORTE G, et al. An improved adaptive large neighborhood search algorithm for multiple agile satellites scheduling[J]. Computers & Operations Research, 2018, 100: 12–25. doi: 10.1016/j.cor.2018.06.020.
    [26] YAO Feng, CHEN Yingguo, WANG Ling, et al. A bilevel evolutionary algorithm for large-scale multiobjective task scheduling in multiagile earth observation satellite systems[J]. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2024, 54(6): 3512–3524. doi: 10.1109/TSMC.2024.3359265.
    [27] CHEN Ming, LI Bohua, XING Lining, et al. DEEM: A differential evolution ensemble method for optimizing agile satellite scheduling[J]. Expert Systems with Applications, 2025, 272: 126771. doi: 10.1016/j.eswa.2025.126771.
    [28] LIU Xiaolu, LAPORTE G, CHEN Yingwu, et al. An adaptive large neighborhood search metaheuristic for agile satellite scheduling with time-dependent transition time[J]. Computers & Operations Research, 2017, 86: 41–53. doi: 10.1016/j.cor.2017.04.006.
    [29] DA ROS F, SOPRANO M, DI GASPERO L, et al. Large language models for combinatorial optimization: A systematic review[EB/OL]. https://arxiv.org/abs/2507.03637, 2025.
    [30] LIU Zheng, XIONG WeI, HAN Chi, et al. A multi-pointer network for multiple agile optical satellite scheduling problem[J]. Aerospace, 2024, 11(10): 792. doi: 10.3390/aerospace11100792.
    [31] CHEN Ming, CHEN Yuning, CHEN Yingwu, et al. Deep reinforcement learning for agile satellite scheduling problem[C]. 2019 IEEE Symposium Series on Computational Intelligence, Xiamen, China, 2019: 126–132. doi: 10.1109/SSCI44817.2019.9002957.
    [32] CHUN Jie, YANG Wenyuan, LIU Xiaolu, et al. Deep reinforcement learning for the agile earth observation satellite scheduling problem[J]. Mathematics, 2023, 11(19): 4059. doi: 10.3390/math11194059.
    [33] MAZYAVKINA N, SVIRIDOV S, IVANOV S, et al. Reinforcement learning for combinatorial optimization: A survey[J]. Computers & Operations Research, 2021, 134: 105400. doi: 10.1016/j.cor.2021.105400.
    [34] LONG Yaosong, SHAN Chengjun, SHANG Wei, et al. Deep reinforcement learning-based approach with varying-scale generalization for the earth observation satellite scheduling problem considering resource consumptions and supplements[J]. IEEE Transactions on Aerospace and Electronic Systems, 2024, 60(3): 2572–2585. doi: 10.1109/TAES.2024.3379169.
    [35] PENG Guansheng, SONG Guopeng, HE Yongming, et al. Solving the agile Earth observation satellite scheduling problem with time-dependent transition times[J]. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2022, 52(3): 1614–1625. doi: 10.1109/TSMC.2020.3031738.
  • 加载中
图(4) / 表(6)
计量
  • 文章访问数:  46
  • HTML全文浏览量:  11
  • PDF下载量:  5
  • 被引次数: 0
出版历程
  • 收稿日期:  2025-09-26
  • 修回日期:  2026-01-09
  • 录用日期:  2026-01-13
  • 网络出版日期:  2026-01-13

目录

    /

    返回文章
    返回