A Task Prediction-Augmented Hierarchical Offloading Method for Space-Air-Ground Integrated Networks
-
摘要: 空天地一体化网络(SAGIN)通过低轨卫星(LEO)、无人机(UAV)与地面设备的协同,构建了面向计算密集型移动应用的高效融合架构。然而,由于无人机轨迹控制、任务卸载与资源分配之间存在强耦合关系,加之任务负载的动态性和不确定性,实现时延与能耗兼顾的高效任务卸载仍面临挑战。本文以任务完成时延与无人机飞行能耗加权成本最小化为优化目标,将优化问题建模为去中心化的部分可观测马尔可夫决策过程(DEC-POMDP),并提出一种任务预测增强的多智能体近端策略优化算法(PA-MAPPO)。该方法在多智能体强化学习框架中引入轻量化任务负载预测模块,以增强多智能体之间的前瞻性决策能力,从而在动态SAGIN环境下实现无人机轨迹规划、任务卸载与计算资源分配的联合优化。仿真结果表明,所提算法能够有效降低综合成本,在平均任务时延与飞行能耗之间取得良好平衡,验证了其在动态SAGIN环境中的有效性。Abstract:
Objective Space-Air-Ground Integrated Networks (SAGIN) have emerged as a critical infrastructure for future 6G communications, enabling wide-area coverage and flexible deployment through the collaborative operation of Low Earth Orbit (LEO) satellites, Unmanned Aerial Vehicles (UAVs), and ground users (GUs). With the rapid proliferation of Internet of Things (IoT), Internet of Vehicles (IoV), and smart city applications, the volume and diversity of computation-intensive tasks generated by terminal devices have grown substantially, placing stringent demands on real-time computing and resource scheduling. The integration of Mobile Edge Computing (MEC) into SAGIN architectures has enabled near-user computation services by deploying UAVs and satellites as edge computing nodes, effectively reducing task latency. However, achieving efficient task offloading that simultaneously minimizes task completion latency and UAV energy consumption remains a significant challenge. This difficulty arises from the strong coupling among UAV trajectory planning, task offloading decisions, and resource allocation, compounded by the highly dynamic and partially observable nature of SAGIN environments. In particular, existing multi-agent reinforcement learning (MARL) approaches predominantly rely on reactive, instantaneous decision-making without proactive awareness of future task workload variations, leading to decision lag and insufficient adaptability under bursty traffic conditions. To address these challenges, this paper proposes a task prediction-augmented MARL framework that endows agents with forward-looking decision capabilities in dynamic SAGIN environments. Methods The system considers a three-layer SAGIN-MEC architecture comprising one LEO satellite, multiple UAVs, and ground users. Tasks can be processed locally, offloaded to UAVs via Ground-to-Air (G2A) links, or further relayed to the LEO satellite via Air-to-Satellite (A2S) links under a partial offloading mechanism. The joint optimization of UAV trajectory, user association, offloading ratios, and computational resource allocation is formulated as a Mixed Integer Nonlinear Programming (MINLP) problem minimizing the weighted sum of average task latency and UAV flight energy consumption. Given its non-convexity and high dimensionality, the problem is reformulated as a Decentralized Partially Observable Markov Decision Process (DEC-POMDP), upon which a Prediction-Augmented Multi-Agent Proximal Policy Optimization (PA-MAPPO) algorithm is proposed. A lightweight Exponential Smoothing–Autoregressive (ES-AR) prediction module generates multi-step workload forecasts that are incorporated into each agent’s state space. The algorithm adopts a bilevel structure: the outer layer employs Centralized Training and Decentralized Execution (CTDE)-based PA-MAPPO to generate UAV trajectory actions, while the inner layer applies Block Coordinate Descent (BCD) convex optimization to solve resource allocation and offloading subproblems, with closed-form solutions derived via Lagrangian analysis. GAE and PPO-Clip mechanisms ensure training stability and convergence. Results and Discussions Simulations involve 1 LEO satellite, 5 UAVs, and 50 ground users in a 1×1 km2 area. PA-MAPPO is compared against MAPPO (without prediction) and PA-MADDPG. Training curves show that PA-MAPPO converges within 500–700 episodes with the highest average reward and smallest variance, demonstrating superior stability ( Fig. 3 ). As the user count increases from 20 to 80, PA-MAPPO consistently maintains the lowest system cost, achieving average reductions of 12.4% and 18.7% relative to MAPPO and PA-MADDPG, respectively (Fig. 4 ). Experiments varying UAV quantity reveal a U-shaped cost curve for all algorithms, with the optimal configuration at U=5. PA-MAPPO achieves the minimum cost at this point (Fig. 5 ). Sensitivity analysis over the energy-latency tradeoff weight ω confirms PA-MAPPO’s robustness across different optimization preferences (Fig. 6 ). The prediction horizon H exhibits a non-monotonic effect on performance, and H=5 yields the optimal result with approximately 14.9% cost reduction over the no-prediction case, while longer horizons degrade performance due to accumulated prediction error (Fig. 7 ).Conclusions This paper proposes the PA-MAPPO algorithm to address the joint optimization of UAV trajectory planning, user association, task offloading, and computational resource allocation in dynamic SAGIN environments. By introducing a lightweight ES-AR task workload prediction module into the MARL framework, the proposed method equips UAV agents with proactive decision-making capabilities that account for future task dynamics, effectively alleviating the decision lag inherent in purely reactive approaches. The inner BCD-based convex optimization guarantees convergence to a KKT-stationary point, while the outer CTDE-based PPO mechanism ensures training stability and scalability. Simulation results demonstrate that PA-MAPPO achieves significant improvements over baseline methods in terms of average task latency, UAV flight energy consumption, and overall system cost, while exhibiting strong scalability and robustness across varying system configurations. Future work will explore online prediction and decision co-optimization mechanisms in multi-satellite cooperative scenarios, as well as the impact of dynamic network topology changes on algorithm performance. -
表 1 实验仿真参数
仿真参数 参数值 卫星高度$ {H}_{\text{s}} $ 500 km UAV数量$ U $ 5 地面用户数量$ G $ 50 UAV飞行高度$ {H}_{u} $ 100 m UAV最大速度$ {v}_{\text{max}} $ 30 m/s UAV最大加速度$ {a}_{\text{max}} $ 5 m/s² 最小安全距离$ {d}_{\text{min}} $ 30 m 通信带宽$ {B}_{g,u} $,$ {B}_{u,L} $ 5 MHz, 10 MHz 计算频率$ f_{g}^{\text{loc}} $,$ f_{u}^{\text{max}} $,$ f_{L}^{\text{max}} $ 1 GHz, 2 GHz, 10 GHz 任务数据量$ {D}_{g}(t) $ 平均0.5~2.0 Mb 噪声功率谱密度$ {N}_{0} $ –174 dBm/Hz 折扣因子$ \gamma $ 0.99 学习率 $ 3\times {10}^{-4} $ 预测窗口$ H $ 5 -
[1] ZHAO Junhui, LI Qiuping, GONG Yi, et al. Computation offloading and resource allocation for cloud assisted mobile edge computing in vehicular networks[J]. IEEE Transactions on Vehicular Technology, 2019, 68(8): 7944–7956. doi: 10.1109/TVT.2019.2917890. [2] DU Jianbo, WANG Jiaxuan, SUN Aijing, et al. Joint optimization in blockchain- and MEC-enabled space-air-ground integrated networks[J]. IEEE Internet of Things Journal, 2024, 11(19): 31862–31877. doi: 10.1109/JIOT.2024.3421529. [3] NGUYEN M D, AJIB W, ZHU Weiping, et al. Integrated user association, computation offloading, resource allocation, and UAV trajectory control against jamming for UAV-based wireless networks[J]. IEEE Transactions on Wireless Communications, 2025, 24(7): 5588–5604. doi: 10.1109/TWC.2025.3547975. [4] FAN Wenhao, SU Yi, LIU Jie, et al. Joint task offloading and resource allocation for vehicular edge computing based on V2I and V2V modes[J]. IEEE Transactions on Intelligent Transportation Systems, 2023, 24(4): 4277–4292. doi: 10.1109/TITS.2022.3230430. [5] ZHANG Haibo, LIU Xiangyu, XU Yongjun, et al. Partial offloading and resource allocation for MEC-assisted vehicular networks[J]. IEEE Transactions on Vehicular Technology, 2024, 73(1): 1276–1288. doi: 10.1109/TVT.2023.3306939. [6] LIU Boyang, WAN Yiyao, ZHOU Fuhui, et al. Resource allocation and trajectory design for MISO UAV-assisted MEC networks[J]. IEEE Transactions on Vehicular Technology, 2022, 71(5): 4933–4948. doi: 10.1109/TVT.2022.3140833. [7] LI Shichao, ALE L, CHEN Hongbin, et al. Joint computation offloading and multidimensional resource allocation in air-ground integrated vehicular edge computing network[J]. IEEE Internet of Things Journal, 2024, 11(20): 32687–32700. doi: 10.1109/JIOT.2024.3441236. [8] ZHANG Yibo, HOU Xiangwang, DU Hongyang, et al. Joint trajectory and resource optimization for UAV and D2D-enabled heterogeneous edge computing networks[J]. IEEE Transactions on Vehicular Technology, 2024, 73(9): 13816–13827. doi: 10.1109/TVT.2024.3397335. [9] HE Jingchao, CHENG Nan, YIN Zhisheng, et al. Service-oriented network resource orchestration in space-air-ground integrated network[J]. IEEE Transactions on Vehicular Technology, 2024, 73(1): 1162–1174. doi: 10.1109/TVT.2023.3301676. [10] JIA Ziye, CAO Yilu, HE Lijun, et al. Service function chain dynamic scheduling in space-air-ground integrated networks[J]. IEEE Transactions on Vehicular Technology, 2025, 74(7): 11235–11248. doi: 10.1109/TVT.2025.3543259. [11] JIA Ziye, CAO Yilu, HE Lijun, et al. NFV-enabled service recovery in space-air-ground integrated networks: A matching game-based approach[J]. IEEE Transactions on Network Science and Engineering, 2025, 12(3): 1732–1744. doi: 10.1109/TNSE.2025.3538614. [12] 曹怡璐, 贾子晔, 尤嘉豪, 等. 基于SDN和NFV的空天地一体化网络任务部署与恢复综述[J]. 电信科学, 2025, 41(5): 1–16. doi: 10.11959/j.issn.1000-0801.2025138.CAO Yilu, JIA Ziye, YOU Jiahao, et al. A survey of task deployment and recovery in space-air-ground integrated networks based on SDN and NFV[J]. Telecommunications Science, 2025, 41(5): 1–16. doi: 10.11959/j.issn.1000-0801.2025138. [13] HUANG Chong, CHEN Gaojie, XIAO Pei, et al. Joint offloading and resource allocation for hybrid cloud and edge computing in SAGINs: A decision assisted hybrid action space deep reinforcement learning approach[J]. IEEE Journal on Selected Areas in Communications, 2024, 42(5): 1029–1043. doi: 10.1109/JSAC.2024.3365899. [14] DU Jingjing, XIONG Lei, FEI Dan, et al. Joint offloading and resource allocation based on Lyapunov algorithm in delay-sensitive SAGIN[J]. Journal of Communications and Networks, 2025, 27(3): 166–178. doi: 10.23919/JCN.2025.000033. [15] HUANG Xinyu, HE Lijun, CHEN Xing, et al. Revenue and energy efficiency-driven delay-constrained computing task offloading and resource allocation in a vehicular edge computing network: A deep reinforcement learning approach[J]. IEEE Internet of Things Journal, 2022, 9(11): 8852–8868. doi: 10.1109/JIOT.2021.3116108. [16] LI Xuanheng, DU Xinyang, ZHAO Nan, et al. Computing over the sky: Joint UAV trajectory and task offloading scheme based on optimization-embedding multi-agent deep reinforcement learning[J]. IEEE Transactions on Communications, 2024, 72(3): 1355–1369. doi: 10.1109/TCOMM.2023.3331029. [17] JIA Min, ZHANG Liang, WU Jian, et al. Deep multiagent reinforcement learning for task offloading and resource allocation in satellite edge computing[J]. IEEE Internet of Things Journal, 2025, 12(4): 3832–3845. doi: 10.1109/JIOT.2024.3482290. [18] MIAO Yiming, WU Gaoxiang, LI Miao, et al. Intelligent task prediction and computation offloading based on mobile-edge cloud computing[J]. Future Generation Computer Systems, 2020, 102: 925–931. doi: 10.1016/j.future.2019.09.035. [19] LI Y, MA X, HUANG L, et al. Adaptive task offloading for mobile edge computing with forecast information[J]. IEEE Transactions on Wireless Communications, 2025, 24(3): 4132–4167. doi: 10.1109/TWC.2024.3489073. [20] PENG Sicong, LI Bin, LIU Lei, et al. Trajectory design and resource allocation for multi-UAV-assisted sensing, communication, and edge computing integration[J]. IEEE Transactions on Communications, 2025, 73(4): 2847–2861. doi: 10.1109/TCOMM.2024.3478115. [21] GAO Yulan, YE Ziqiang, and YU Han. Cost-efficient computation offloading in SAGIN: A deep reinforcement learning and perception-aided approach[J]. IEEE Journal on Selected Areas in Communications, 2024, 42(12): 3462–3476. doi: 10.1109/JSAC.2024.3459073. [22] SHEN Hang, TIAN Yibo, WANG Tianjing, et al. Slicing-based task offloading in space-air-ground integrated vehicular networks[J]. IEEE Transactions on Mobile Computing, 2024, 23(5): 4009–4024. doi: 10.1109/TMC.2023.3283852. [23] TRAN T X and POMPILI D. Joint task offloading and resource allocation for multi-server mobile-edge computing networks[J]. IEEE Transactions on Vehicular Technology, 2019, 68(1): 856–868. doi: 10.1109/TVT.2018.2881191. [24] 张京奎, 王星星, 陈永昌. 结合去趋势的AR模型变形数据预测[J]. 电子技术与软件工程, 2022(12): 259–262 doi: 10.20109/j.cnki.etse.2022.12.063.ZHANG Jingkui, WANG Xingxing, and CHEN Yongchang. Deformation data prediction using AR model combined with detrending[J]. Electronic Technology & Software Engineering, 2022(12): 259–262 doi: 10.20109/j.cnki.etse.2022.12.063. [25] LIU Qinghua, NETRAPALLI P, SZEPESVARI C, et al. Optimistic MLE: A generic model-based algorithm for partially observable sequential decision making[C]. Proceedings of the 55th Annual ACM Symposium on Theory of Computing, Orlando, USA, 2023: 363–376. doi: 10.1145/3564246.3585161. [26] CHEN Gong, ZHAI X B, and LI Congduan. Joint optimization of trajectory and user association via reinforcement learning for UAV-aided data collection in wireless networks[J]. IEEE Transactions on Wireless Communications, 2023, 22(5): 3128–3143. doi: 10.1109/TWC.2022.3216049. [27] ZHAO Youhan, LIU Chenxi, HU Xiaoling, et al. Joint content caching, service placement, and task offloading in UAV-enabled mobile edge computing networks[J]. IEEE Journal on Selected Areas in Communications, 2025, 43(1): 51–63. doi: 10.1109/JSAC.2024.3460049. [28] XIAO Yang, SONG Yuqian, and LIU Jun. Collaborative multi-agent deep reinforcement learning for energy-efficient resource allocation in heterogeneous mobile edge computing networks[J]. IEEE Transactions on Wireless Communications, 2024, 23(6): 6653–6668. doi: 10.1109/TWC.2023.3335597. [29] FAN Kexin, FENG Bowen, ZHANG Xilin, et al. Demand-driven task scheduling and resource allocation in space-air-ground integrated network: A deep reinforcement learning approach[J]. IEEE Transactions on Wireless Communications, 2024, 23(10): 13053–13067. doi: 10.1109/TWC.2024.3398199. [30] LIU Yi, XIE Shengli, and ZHANG Yan. Cooperative offloading and resource management for UAV-enabled mobile edge computing in power IoT system[J]. IEEE Transactions on Vehicular Technology, 2020, 69(10): 12229–12239. doi: 10.1109/TVT.2020.3016840. [31] DAI Minghui, HUANG Ning, WU Yuan, et al. Latency minimization oriented hybrid offshore and aerial-based multi-access computation offloading for marine communication networks[J]. IEEE Transactions on Communications, 2023, 71(11): 6482–6498. doi: 10.1109/TCOMM.2023.3306581. [32] HUANG Xiaohui, LING Jiahao, YANG Xiaofei, et al. Multi-agent mix hierarchical deep reinforcement learning for large-scale fleet management[J]. IEEE Transactions on Intelligent Transportation Systems, 2023, 24(12): 14294–14305. doi: 10.1109/TITS.2023.3302014. [33] 沈学民, 承楠, 周海波, 等. 空天地一体化网络技术: 探索与展望[J]. 物联网学报, 2020, 4(3): 1–19. doi: 10.11959/j.issn.2096-3750.2020.00142.SHEN Xuemin, CHENG Nan, ZHOU Haibo, et al. Space-air-ground integrated networks: Review and prospect[J]. Chinese Journal on Internet of Things, 2020, 4(3): 1–19. doi: 10.11959/j.issn.2096-3750.2020.00142. -
下载:
下载: