面向低轨卫星通信网络的联邦深度强化学习智能路由方法

李学华; 廖海龙; 张贤; 周家恩

doi:10.11999/JEIT250072

面向低轨卫星通信网络的联邦深度强化学习智能路由方法

doi: 10.11999/JEIT250072 cstr: 32379.14.JEIT240072

李学华^{1, 2},
廖海龙^{1, 2},
张贤^{1, 2, ,},
周家恩³

1.
北京信息科技大学信息与通信工程学院北京 102206
2.
北京信息科技大学现代测控技术教育部重点实验室北京 102206
3.
北京邮电大学网络与交换技术国家重点实验室北京 100876

基金项目: 国家自然科学基金(62401066)，北京市自然科学基金(L222004)，北京信息科技大学分类发展“青年骨干教师”支持计划项目(YBT 202420)，北京信息科技大学校科研基金(2024XJJ07)

详细信息

作者简介:
李学华：女，教授，博士生导师，研究方向为智能计算与无线通信等

廖海龙：男，硕士生，研究方向为低轨卫星通信网络智能路由和强化学习等

张贤：男，讲师、硕士生导师，研究方向为非地面无线网络和通感算融合等

周家恩：男，博士生，研究方向为卫星通感算一体化技术等

通讯作者:
张贤　zhangxian@bistu.edu.cn

中图分类号: TN927.2
计量
- 文章访问数: 776
- HTML全文浏览量: 365
- PDF下载量: 140
- 被引次数: 0
出版历程
- 收稿日期: 2025-02-12
- 修回日期: 2025-07-17
- 网络出版日期: 2025-07-26
- 刊出日期: 2025-08-27

Federated Deep Reinforcement Learning-based Intelligent Routing Design for LEO Satellite Networks

LI Xuehua^{1, 2},
LIAO Hailong^{1, 2},
ZHANG Xian^{1, 2
, ,},
ZHOU Jiaen³

1.
School of Information and Communication Engineering, Beijing Information Science and Technology University, Beijing 102206, China
2.
Key Laboratory of Modern Measurement and Control Technology, Ministry of Education, Beijing Information Science and Technology University, Beijing 102206, China
3.
State Key Laboratory of Networking and Switching Technology, Beijing University of Posts and Telecommunications, Beijing 100876, China

Funds: The National Natural Science Foundation of China (62401066), Beijing Natural Science Foundation (L222004), The Young Backbone Teacher Support Plan of Beijing Information Science & Technology University (BISTU) ((YBT 202420)), The BISTU Research Foundation (2024XJJ07)

摘要

摘要: 低轨卫星通信网络拓扑结构动态变化，传统地面网络路由方法难以直接适用，同时由于卫星星载资源受限，基于人工智能的路由方法通常学习效率较低，而协同训练需要数据共享和传输，难度大且存在数据安全风险。为此，针对上述挑战，该文提出一种基于卫星分簇的多智能体联邦深度强化学习路由方法。首先，设计了结合网络拓扑、通信和能耗的低轨卫星通信网络路由模型；然后，基于每颗卫星的平均连接度将星座节点划分为多个簇，在簇内采用联邦深度强化学习框架，通过簇内卫星协同共享模型参数，共同训练对应簇内的全局模型，以最大化网络能量效率。最后，仿真结果表明，该文所设计方法对比Sarsa、MAD2QN和REINFORCE 3种基准方法，网络平均吞吐量分别提高83.7%, 19.8%和14.1%；数据包平均跳数分别减少25.0%, 18.9%和9.1%；网络能量效率分别提升55.6%, 42.9%和45.8%。
- 低轨卫星通信 /
- 路由方法 /
- 卫星分簇 /
- 联邦深度强化学习 /
- 能量效率
Abstract: Objective The topology of Low Earth Orbit (LEO) satellite communication networks is highly dynamic, rendering traditional terrestrial routing methods unsuitable for direct application. Additionally, due to the limited onboard resources of satellites, Artificial Intelligence (AI)-based routing methods often experience low learning efficiency. Collaborative training requires data sharing and transmission, which poses significant challenges and data security risks. To address these issues, this research introduces Federated Deep Reinforcement Learning (FDRL) into LEO satellite communication networks. By leveraging FDRL’s capabilities in distributed perception, decision-making, and training, it facilitates the efficient learning of global routing strategies. Through local model aggregation and global model sharing among satellite nodes, FDRL dynamically adapts to topology changes while ensuring data privacy, thereby generating optimal routing decisions and enhancing the overall routing performance of LEO satellite networks. Furthermore, integrating Federated Learning (FL) into the LEO satellite network enables autonomous constellation training within regions, eliminating the need to transmit raw data to Ground Stations (GS), thus reducing reliance on GS and minimizing communication overhead during collaborative training. Methods A novel FDRL-based intelligent routing method for LEO satellite communication networks is proposed. This method develops a routing model that integrates network, communication, and computational energy consumption, with the optimization objective focused on maximizing the energy efficiency of the LEO satellite network. Utilizing a satellite clustering algorithm, the entire LEO satellite network is partitioned into multiple clusters. Within each cluster, the FDRL framework is implemented, where each LEO satellite uses the Advantage Actor-Critic (A2C) algorithm for local reinforcement learning. The policy network generates efficient routing actions, while the value network dynamically evaluates state values to reduce variance in policy updates. After a specified number of training rounds, the Federated Proximal Algorithm (FedProx) is applied at the cluster head satellite to conduct federated aggregation within the cluster. By collaboratively sharing model parameters among satellites, a global model is jointly trained, enhancing the generalization capability to optimize the network's energy efficiency. Results and Discussions To validate the effectiveness of the proposed method, the LEO satellite constellation is first clustered using the suggested clustering algorithm. The number of Cluster Member (CM) nodes within each cluster ranges from 6 to 8 (Fig. 5), with the variation in the CM node count not exceeding 5, indicating relatively stable clustering. FDRL training is then conducted within each cluster. Simulation results show that when the aggregation frequency is set to 400 (i.e., aggregation occurs every 400 time slots), training energy consumption is minimized (Fig. 6), and the reward is most stable (Fig. 7) compared to other aggregation frequencies. Next, the performance of the designed FL-A2C algorithm is compared to other baseline algorithms. The results demonstrate that the FL-A2C algorithm exhibits better convergence and higher total reward values than the benchmarks, namely Sarsa, MAD2QN, and REINFORCE (Fig. 8), although its total reward is slightly lower than that of A2C. Compared to Sarsa, REINFORCE, and MAD2QN, the designed method improves average network throughput by 83.7%, 19.8%, and 14.1%, respectively (Fig. 9); reduces average hop count by 25.0%, 18.9%, and 9.1%, respectively (Fig. 10); and enhances energy efficiency by 55.6%, 42.9%, and 45.8%, respectively (Fig. 11). Conclusions To address the challenges posed by the highly dynamic network topology of LEO satellite networks and the limitations of traditional terrestrial routing methods, this research presents a multi-agent FDRL routing method combined with satellite clustering. Comprehensive simulations are conducted to evaluate the intelligent routing method, and the results demonstrate that: (1) The designed FL-A2C algorithm achieves better convergence and enhances the energy efficiency of LEO satellite networks; (2) The stability of LEO satellite clustering is ensured by the proposed scheme; (3) The intelligent routing method outperforms benchmark schemes (Sarsa, REINFORCE, MAD2QN) with triple advantages, achieving 83.7%/19.8%/14.1% higher network throughput, 25.0%/18.9%/9.1% lower hop counts, and 55.6%/42.9%/45.8% better energy efficiency, respectively.
- LEO satellite communication /
- Routing methods /
- Satellite clustering /
- Federated Deep Reinforcement Learning (FDRL) /
- Energy efficiency

HTML全文

图 1 系统模型

下载: 全尺寸图片幻灯片

图 2 卫星分簇初始化流程

下载: 全尺寸图片幻灯片

图 3 A2C算法架构示意图

下载: 全尺寸图片幻灯片

图 4 联邦学习训练流程

下载: 全尺寸图片幻灯片

图 5 簇内成员节点状态变化次数曲线

下载: 全尺寸图片幻灯片

图 6 不同聚合频率下的能耗对比

下载: 全尺寸图片幻灯片

图 7 不同聚合频率下的奖励变化曲线

下载: 全尺寸图片幻灯片

图 8 不同算法下的奖励变化对比

下载: 全尺寸图片幻灯片

图 9 不同数据包数量下的网络平均吞吐量对比曲线

下载: 全尺寸图片幻灯片

图 10 不同数据包数量下的平均跳数对比

下载: 全尺寸图片幻灯片

图 11 不同算法下的能量效率变化对比

下载: 全尺寸图片幻灯片

1 基于FL-A2C的智能路由算法(FL-A2C算法)

初始化：$ E $为训练轮数, $ T $为时间片数量，$ {\mathcal{K}_t} $为卫星分簇，$ \chi $为　卫星节点队列大小，$ {N_{{\text{packet}}}} $为数据包数量，$ {T_{{\text{agg}}}} $为聚合频率；
1: 簇头初始化全局模型并将网络参数分发至各个簇中的成员卫　星节点
2: for episodes = 1 to $ E $ do
3: 　清空卫星队列生成$ {N_{{\text{packet}}}} $个数据包发送至各个卫星节点
4: 　for step = 1 to $ T $ do
5: 　　for $ \|{\kappa _t}\| $ = 1 to $\| {\mathcal{K}_t}\| $ do
6: 　　　for agent = 1 to $ {\kappa _t} $ do
7: 　　　　if agent 的发送队列$ Q_{{v_i}}^{{\text{send}}} $容量 < $ \chi $ and 接收队列　　　　　　$ Q_{{v_i}}^{{\text{recv}}} $容量!=0 then
8: 　　　　接收队列$ Q_{{v_i}}^{{\text{recv}}} $中的数据包进入发送队列$ Q_{{v_i}}^{{\text{send}}} $
9: 　　　　end if
10: 　　　 for p = 1 to 　　　　　　$ Q_{{v_i}}^{{\text{send}}} $ do
11: 　　　　 agent从队列中取出单个数据包并获取当前状态　　　　　　　$ s_t^{{v_i},p} $
12: 　　　　将状态$ s_t^{{v_i},p} $输入至Actor网络依据策略　　　　　　　$ \pi \left( {{a_t}\|{s_t},{{\boldsymbol{\theta}} _t}} \right) $进行采样获取动作$ a_t^{{v_i},p} $
13: 　　　　 if 下一跳节点接收队列容量$ Q_{{v_i}}^{{\text{recv}}} $< $ \chi $ then
14: 　　　　　执行动作$ a_t^{{v_i},p} $并获得奖励$ r_t^{{v_i},p} $，下一状态　　　　　　　　$ s_{t + 1}^{{v_i},p} $以及结束二元变量done
15: 　　　　　记录历史动作信息，另$ a_{t - 1}^{{v_i},p} = a_t^{{v_i},p} $
16: 　　　　 else
17: 　　　　　将数据包p置入接收队列队尾
18: 　　　　 end if
19: 　　　　将当前状态$ s_t^{{v_i},p} $、下一状态$ s_{t + 1}^{{v_i},p} $、奖励$ r_t^{{v_i},p} $以　　　　　　　及动作概率密度输入至Critic网络
20: 　　　　根据式(19)计算TD Error，并依据损失函数式(26) 　　　　　　　做梯度下降更新Critic网络参数
21: 　　　　根据TD Error得到优势函数，并根据损失函数　　　　　　　式(25)做梯度下降更新Actor网络参数
22: 　　　　 end for
23: 　　　 end for
24: 　　 end for
25: end for
26: 记录系统时间，另$ {t_{{\text{total}}}} = {t_{{\text{total}}}} + T $
27: if $ {t_{{\text{total}}}} $ % $ {T_{{\text{agg}}}} $ == 0 then
28: 随机选取两个簇根据式(27)和式(28)在簇头做联邦聚合，并　将全局模型参数下发至各客户端
29: end if
30: end for

下载: 导出CSV

表 1 星座及路由参数

星座及路由参数	数值
轨道高度$ H $, 倾角$ \beta $	550 km, 53°
星座参数$ M $, $ N $	10 个, 10颗
簇内最大跳数$ h $	3 跳
最大容忍时延$ {\delta _s} $	1000 ms
单个数据包大小$ {l_s} $	1024 Byte
队列缓冲大小$ \chi $	1 Mb
CPU参数$ f $, $ {C_k} $	3 GHz, 1 kcycle/bit
发射功率$ P $, 带宽$ B $	5 kW, 100 Mbps

下载: 导出CSV

表 2 深度神经网络模型参数

模型参数	数值
损失函数	均方误差(MSE)
优化器	Adam
激活函数	ReLU, Softmax
Actor网络, Critic网络	2个128个单元的隐藏层
折扣因子$ \gamma $	0.99
学习率$ {\alpha _{\boldsymbol{\theta}} } $, $ {\alpha _{\boldsymbol{\omega}} } $	0.0003, 0.0005
联邦聚合频率$ {T_{{\text{agg}}}} $	每400个时间片聚合1次

下载: 导出CSV

参考文献(20)

[1]	SUN Yaohua, PENG Mugen, ZHANG Shijie, et al. Integrated satellite-terrestrial networks: Architectures, key techniques, and experimental progress[J]. IEEE Network, 2022, 36(6): 191–198. doi: 10.1109/MNET.106.2100622.
[2]	孙耀华, 彭木根. 面向手机直连的低轨卫星通信: 关键技术、发展现状与未来展望[J]. 电信科学, 2023, 39(2): 25–36. doi: 10.11959/j.issn.1000–0801.2023031. SUN Yaohua and PENG Mugen. Low earth orbit satellite communication supporting direct connection with mobile phones: Key technologies, recent progress and future directions[J]. Telecommunications Science, 2023, 39(2): 25–36. doi: 10.11959/j.issn.1000–0801.2023031.
[3]	ZHU Xiangming and JIANG Chunxiao. Integrated satellite-terrestrial networks toward 6G: Architectures, applications, and challenges[J]. IEEE Internet of Things Journal, 2022, 9(1): 437–461. doi: 10.1109/JIOT.2021.3126825.
[4]	WANG Cheng, WANG Huawen, and WANG Weidong. A two-hops state-aware routing strategy based on deep reinforcement learning for LEO satellite networks[J]. Electronics, 2019, 8(9): 920. doi: 10.3390/electronics8090920.
[5]	XU Guoliang, ZHAO Yanyun, RAN Yongyi, et al. Spatial location aided fully-distributed dynamic routing for large-scale LEO satellite networks[J]. IEEE Communications Letters, 2022, 26(12): 3034–3038. doi: 10.1109/LCOMM.2022.3205300.
[6]	LIAO Hailong, ZHANG Xian, ZHOU Jiaen, et al. Real-time routing design for LEO satellite networks: An enhanced multi-agent DRL approach[C]. 2024 IEEE/CIC International Conference on Communications in China (ICCC Workshops), Hangzhou, China, 2024: 547–552. doi: 10.1109/ICCCWorkshops62562.2024.10693714.
[7]	MATTHIESEN B, RAZMI N, LEYVA-MAYORGA I, et al. Federated learning in satellite constellations[J]. IEEE Network, 2024, 38(2): 232–239. doi: 10.1109/MNET.132.2200504.
[8]	ELMAHALLAWY M and LUO Tie. Optimizing federated learning in LEO satellite constellations via intra-plane model propagation and sink satellite scheduling[C]. ICC 2023 - IEEE International Conference on Communications, Rome, Italy, 2023: 3444–3449. doi: 10.1109/ICC45041.2023.10279316.
[9]	SO J, HSIEH K, ARZANI B, et al. FedSpace: An efficient federated learning framework at satellites and ground stations[J]. arXiv: 2202.01267, 2022.
[10]	FADLULLAH Z M and KATO N. On smart IoT remote sensing over integrated terrestrial-aerial-space networks: An asynchronous federated learning approach[J]. IEEE Network, 2021, 35(5): 129–135. doi: 10.1109/MNET.101.2100125.
[11]	ZHAO Ming, CHEN Chen, LIU Lei, et al. Orbital collaborative learning in 6G space-air-ground integrated networks[J]. Neurocomputing, 2022, 497: 94–109. doi: 10.1016/j.neucom.2022.04.098.
[12]	SINGH J, DHURANDHER S K, and WOUNGANG I. Federated learning empowered routing for opportunistic network environments[C]. 2024 IEEE International Conference on Communications Workshops (ICC Workshops), Denver, USA, 2024: 1998–2004. doi: 10.1109/ICCWorkshops59551.2024.10615288.
[13]	WANG Xiaoding, HU Jia, LIN Hui, et al. QoS and privacy-aware routing for 5G-enabled industrial Internet of Things: A federated reinforcement learning approach[J]. IEEE Transactions on Industrial Informatics, 2022, 18(6): 4189–4197. doi: 10.1109/TII.2021.3124848.
[14]	FENG Xinao, SUN Yaohua, and PENG Mugen. Distributed satellite-terrestrial cooperative routing strategy based on minimum hop-count analysis in mega LEO satellite constellation[J]. IEEE Transactions on Mobile Computing, 2024, 23(11): 10678–10693. doi: 10.1109/TMC.2024.3380891.
[15]	张朝辉, 周嘉琦. 基于半固定分簇的无线传感器网络节能分簇路由算法[J]. 通信学报, 2024, 45(4): 160–170. doi: 10.11959/j.issn.1000-436x.2024080. ZHANG Zhaohui and ZHOU Jiaqi. Energy-saving clustering routing algorithm based on semi-fixed cluster for wireless sensor networks[J]. Journal on Communications, 2024, 45(4): 160–170. doi: 10.11959/j.issn.1000-436x.2024080.
[16]	ZHANG Hong, TIAN Hao, DONG Mianxiong, et al. FedPCC: Parallelism of communication and computation for federated learning in wireless networks[J]. IEEE Transactions on Emerging Topics in Computational Intelligence, 2022, 6(6): 1368–1377. doi: 10.1109/TETCI.2022.3170471.
[17]	ZHANG Hangyu, LIU Rongke, KAUSHIK A, et al. Satellite edge computing with collaborative computation offloading: An intelligent deep deterministic policy gradient approach[J]. IEEE Internet of Things Journal, 2023, 10(10): 9092–9107. doi: 10.1109/JIOT.2022.3233383.
[18]	陈宇, 张勇, 陈实. 大规模卫星集群网络自适应加权分簇算法[J]. 北京理工大学学报, 2021, 41(11): 1188–1192. doi: 10.15918/j.tbit1001-0645.2021.072. CHEN Yu, ZHANG Yong, and CHEN Shi. Adaptive weighted clustering algorithm for large-scale satellite cluster network[J]. Transactions of Beijing Institute of Technology, 2021, 41(11): 1188–1192. doi: 10.15918/j.tbit1001-0645.2021.072.
[19]	王瑞峰, 张明, 黄子恒, 等. 利用A2C-ac的城轨车车通信资源分配算法[J]. 电子与信息学报, 2024, 46(4): 1306–1313. doi: 10.11999/JEIT230623. WANG Ruifeng, ZHANG Ming, HUANG Ziheng, et al. Resource allocation algorithm of urban rail train-to-train communication with A2C-ac[J]. Journal of Electronics & Information Technology, 2024, 46(4): 1306–1313. doi: 10.11999/JEIT230623.
[20]	LI Tian, SAHU A K, ZAHEER M, et al. Federated optimization in heterogeneous networks[C]. The 3rd Conference on Machine Learning and Systems (MLSys 2020), Austin, USA, 2020: 303–313.