高级搜索

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

面向低轨卫星通信网络的联邦深度强化学习智能路由方法

李学华 廖海龙 张贤 周家恩

李学华, 廖海龙, 张贤, 周家恩. 面向低轨卫星通信网络的联邦深度强化学习智能路由方法[J]. 电子与信息学报, 2025, 47(8): 2652-2664. doi: 10.11999/JEIT250072
引用本文: 李学华, 廖海龙, 张贤, 周家恩. 面向低轨卫星通信网络的联邦深度强化学习智能路由方法[J]. 电子与信息学报, 2025, 47(8): 2652-2664. doi: 10.11999/JEIT250072
LI Xuehua, LIAO Hailong, ZHANG Xian, ZHOU Jiaen. Federated Deep Reinforcement Learning-based Intelligent Routing Design for LEO Satellite Networks[J]. Journal of Electronics & Information Technology, 2025, 47(8): 2652-2664. doi: 10.11999/JEIT250072
Citation: LI Xuehua, LIAO Hailong, ZHANG Xian, ZHOU Jiaen. Federated Deep Reinforcement Learning-based Intelligent Routing Design for LEO Satellite Networks[J]. Journal of Electronics & Information Technology, 2025, 47(8): 2652-2664. doi: 10.11999/JEIT250072

面向低轨卫星通信网络的联邦深度强化学习智能路由方法

doi: 10.11999/JEIT250072 cstr: 32379.14.JEIT240072
基金项目: 国家自然科学基金(62401066),北京市自然科学基金(L222004),北京信息科技大学分类发展“青年骨干教师”支持计划项目(YBT 202420),北京信息科技大学校科研基金(2024XJJ07)
详细信息
    作者简介:

    李学华:女,教授,博士生导师,研究方向为智能计算与无线通信等

    廖海龙:男,硕士生,研究方向为低轨卫星通信网络智能路由和强化学习等

    张贤:男,讲师、硕士生导师,研究方向为非地面无线网络和通感算融合等

    周家恩:男,博士生,研究方向为卫星通感算一体化技术等

    通讯作者:

    张贤 zhangxian@bistu.edu.cn

  • 中图分类号: TN927.2

Federated Deep Reinforcement Learning-based Intelligent Routing Design for LEO Satellite Networks

Funds: The National Natural Science Foundation of China (62401066), Beijing Natural Science Foundation (L222004), The Young Backbone Teacher Support Plan of Beijing Information Science & Technology University (BISTU) ((YBT 202420)), The BISTU Research Foundation (2024XJJ07)
  • 摘要: 低轨卫星通信网络拓扑结构动态变化,传统地面网络路由方法难以直接适用,同时由于卫星星载资源受限,基于人工智能的路由方法通常学习效率较低,而协同训练需要数据共享和传输,难度大且存在数据安全风险。为此,针对上述挑战,该文提出一种基于卫星分簇的多智能体联邦深度强化学习路由方法。首先,设计了结合网络拓扑、通信和能耗的低轨卫星通信网络路由模型;然后,基于每颗卫星的平均连接度将星座节点划分为多个簇,在簇内采用联邦深度强化学习框架,通过簇内卫星协同共享模型参数,共同训练对应簇内的全局模型,以最大化网络能量效率。最后,仿真结果表明,该文所设计方法对比Sarsa、MAD2QN和REINFORCE 3种基准方法,网络平均吞吐量分别提高83.7%, 19.8%和14.1%;数据包平均跳数分别减少25.0%, 18.9%和9.1%;网络能量效率分别提升55.6%, 42.9%和45.8%。
  • 图  1  系统模型

    图  2  卫星分簇初始化流程

    图  3  A2C算法架构示意图

    图  4  联邦学习训练流程

    图  5  簇内成员节点状态变化次数曲线

    图  6  不同聚合频率下的能耗对比

    图  7  不同聚合频率下的奖励变化曲线

    图  8  不同算法下的奖励变化对比

    图  9  不同数据包数量下的网络平均吞吐量对比曲线

    图  10  不同数据包数量下的平均跳数对比

    图  11  不同算法下的能量效率变化对比

    1  基于FL-A2C的智能路由算法(FL-A2C算法)

     初始化:$ E $为训练轮数, $ T $为时间片数量,$ {\mathcal{K}_t} $为卫星分簇,$ \chi $为
     卫星节点队列大小,$ {N_{{\text{packet}}}} $为数据包数量,$ {T_{{\text{agg}}}} $为聚合频率;
     1: 簇头初始化全局模型并将网络参数分发至各个簇中的成员卫
     星节点
     2: for episodes = 1 to $ E $ do
     3:  清空卫星队列生成$ {N_{{\text{packet}}}} $个数据包发送至各个卫星节点
     4:  for step = 1 to $ T $ do
     5:   for $ |{\kappa _t}| $ = 1 to $| {\mathcal{K}_t}| $ do
     6:    for agent = 1 to $ {\kappa _t} $ do
     7:     if agent 的发送队列$ Q_{{v_i}}^{{\text{send}}} $容量 < $ \chi $ and 接收队列
          $ Q_{{v_i}}^{{\text{recv}}} $容量!=0 then
     8:     接收队列$ Q_{{v_i}}^{{\text{recv}}} $中的数据包进入发送队列$ Q_{{v_i}}^{{\text{send}}} $
     9:     end if
     10:     for p = 1 to
          $ Q_{{v_i}}^{{\text{send}}} $ do
     11:      agent从队列中取出单个数据包并获取当前状态
           $ s_t^{{v_i},p} $
     12:      将状态$ s_t^{{v_i},p} $输入至Actor网络依据策略
           $ \pi \left( {{a_t}|{s_t},{{\boldsymbol{\theta}} _t}} \right) $进行采样获取动作$ a_t^{{v_i},p} $
     13:      if 下一跳节点接收队列容量$ Q_{{v_i}}^{{\text{recv}}} $< $ \chi $ then
     14:       执行动作$ a_t^{{v_i},p} $并获得奖励$ r_t^{{v_i},p} $,下一状态
            $ s_{t + 1}^{{v_i},p} $以及结束二元变量done
     15:       记录历史动作信息,另$ a_{t - 1}^{{v_i},p} = a_t^{{v_i},p} $
     16:      else
     17:       将数据包p置入接收队列队尾
     18:      end if
     19:      将当前状态$ s_t^{{v_i},p} $、下一状态$ s_{t + 1}^{{v_i},p} $、奖励$ r_t^{{v_i},p} $以
            及动作概率密度输入至Critic网络
     20:      根据式(19)计算TD Error,并依据损失函数式(26)
           做梯度下降更新Critic网络参数
     21:      根据TD Error得到优势函数,并根据损失函数
            式(25)做梯度下降更新Actor网络参数
     22:      end for
     23:     end for
     24:    end for
     25: end for
     26: 记录系统时间,另$ {t_{{\text{total}}}} = {t_{{\text{total}}}} + T $
     27: if $ {t_{{\text{total}}}} $ % $ {T_{{\text{agg}}}} $ == 0 then
     28: 随机选取两个簇根据式(27)和式(28)在簇头做联邦聚合,并
     将全局模型参数下发至各客户端
     29: end if
     30: end for
    下载: 导出CSV

    表  1  星座及路由参数

    星座及路由参数 数值
    轨道高度$ H $, 倾角$ \beta $ 550 km, 53°
    星座参数$ M $, $ N $ 10 个, 10颗
    簇内最大跳数$ h $ 3 跳
    最大容忍时延$ {\delta _s} $ 1000 ms
    单个数据包大小$ {l_s} $ 1024 Byte
    队列缓冲大小$ \chi $ 1 Mb
    CPU参数$ f $, $ {C_k} $ 3 GHz, 1 kcycle/bit
    发射功率$ P $, 带宽$ B $ 5 kW, 100 Mbps
    下载: 导出CSV

    表  2  深度神经网络模型参数

    模型参数 数值
    损失函数 均方误差(MSE)
    优化器 Adam
    激活函数 ReLU, Softmax
    Actor网络, Critic网络 2个128个单元的隐藏层
    折扣因子$ \gamma $ 0.99
    学习率$ {\alpha _{\boldsymbol{\theta}} } $, $ {\alpha _{\boldsymbol{\omega}} } $ 0.0003, 0.0005
    联邦聚合频率$ {T_{{\text{agg}}}} $ 每400个时间片聚合1次
    下载: 导出CSV
  • [1] SUN Yaohua, PENG Mugen, ZHANG Shijie, et al. Integrated satellite-terrestrial networks: Architectures, key techniques, and experimental progress[J]. IEEE Network, 2022, 36(6): 191–198. doi: 10.1109/MNET.106.2100622.
    [2] 孙耀华, 彭木根. 面向手机直连的低轨卫星通信: 关键技术、发展现状与未来展望[J]. 电信科学, 2023, 39(2): 25–36. doi: 10.11959/j.issn.1000–0801.2023031.

    SUN Yaohua and PENG Mugen. Low earth orbit satellite communication supporting direct connection with mobile phones: Key technologies, recent progress and future directions[J]. Telecommunications Science, 2023, 39(2): 25–36. doi: 10.11959/j.issn.1000–0801.2023031.
    [3] ZHU Xiangming and JIANG Chunxiao. Integrated satellite-terrestrial networks toward 6G: Architectures, applications, and challenges[J]. IEEE Internet of Things Journal, 2022, 9(1): 437–461. doi: 10.1109/JIOT.2021.3126825.
    [4] WANG Cheng, WANG Huawen, and WANG Weidong. A two-hops state-aware routing strategy based on deep reinforcement learning for LEO satellite networks[J]. Electronics, 2019, 8(9): 920. doi: 10.3390/electronics8090920.
    [5] XU Guoliang, ZHAO Yanyun, RAN Yongyi, et al. Spatial location aided fully-distributed dynamic routing for large-scale LEO satellite networks[J]. IEEE Communications Letters, 2022, 26(12): 3034–3038. doi: 10.1109/LCOMM.2022.3205300.
    [6] LIAO Hailong, ZHANG Xian, ZHOU Jiaen, et al. Real-time routing design for LEO satellite networks: An enhanced multi-agent DRL approach[C]. 2024 IEEE/CIC International Conference on Communications in China (ICCC Workshops), Hangzhou, China, 2024: 547–552. doi: 10.1109/ICCCWorkshops62562.2024.10693714.
    [7] MATTHIESEN B, RAZMI N, LEYVA-MAYORGA I, et al. Federated learning in satellite constellations[J]. IEEE Network, 2024, 38(2): 232–239. doi: 10.1109/MNET.132.2200504.
    [8] ELMAHALLAWY M and LUO Tie. Optimizing federated learning in LEO satellite constellations via intra-plane model propagation and sink satellite scheduling[C]. ICC 2023 - IEEE International Conference on Communications, Rome, Italy, 2023: 3444–3449. doi: 10.1109/ICC45041.2023.10279316.
    [9] SO J, HSIEH K, ARZANI B, et al. FedSpace: An efficient federated learning framework at satellites and ground stations[J]. arXiv: 2202.01267, 2022.
    [10] FADLULLAH Z M and KATO N. On smart IoT remote sensing over integrated terrestrial-aerial-space networks: An asynchronous federated learning approach[J]. IEEE Network, 2021, 35(5): 129–135. doi: 10.1109/MNET.101.2100125.
    [11] ZHAO Ming, CHEN Chen, LIU Lei, et al. Orbital collaborative learning in 6G space-air-ground integrated networks[J]. Neurocomputing, 2022, 497: 94–109. doi: 10.1016/j.neucom.2022.04.098.
    [12] SINGH J, DHURANDHER S K, and WOUNGANG I. Federated learning empowered routing for opportunistic network environments[C]. 2024 IEEE International Conference on Communications Workshops (ICC Workshops), Denver, USA, 2024: 1998–2004. doi: 10.1109/ICCWorkshops59551.2024.10615288.
    [13] WANG Xiaoding, HU Jia, LIN Hui, et al. QoS and privacy-aware routing for 5G-enabled industrial Internet of Things: A federated reinforcement learning approach[J]. IEEE Transactions on Industrial Informatics, 2022, 18(6): 4189–4197. doi: 10.1109/TII.2021.3124848.
    [14] FENG Xinao, SUN Yaohua, and PENG Mugen. Distributed satellite-terrestrial cooperative routing strategy based on minimum hop-count analysis in mega LEO satellite constellation[J]. IEEE Transactions on Mobile Computing, 2024, 23(11): 10678–10693. doi: 10.1109/TMC.2024.3380891.
    [15] 张朝辉, 周嘉琦. 基于半固定分簇的无线传感器网络节能分簇路由算法[J]. 通信学报, 2024, 45(4): 160–170. doi: 10.11959/j.issn.1000-436x.2024080.

    ZHANG Zhaohui and ZHOU Jiaqi. Energy-saving clustering routing algorithm based on semi-fixed cluster for wireless sensor networks[J]. Journal on Communications, 2024, 45(4): 160–170. doi: 10.11959/j.issn.1000-436x.2024080.
    [16] ZHANG Hong, TIAN Hao, DONG Mianxiong, et al. FedPCC: Parallelism of communication and computation for federated learning in wireless networks[J]. IEEE Transactions on Emerging Topics in Computational Intelligence, 2022, 6(6): 1368–1377. doi: 10.1109/TETCI.2022.3170471.
    [17] ZHANG Hangyu, LIU Rongke, KAUSHIK A, et al. Satellite edge computing with collaborative computation offloading: An intelligent deep deterministic policy gradient approach[J]. IEEE Internet of Things Journal, 2023, 10(10): 9092–9107. doi: 10.1109/JIOT.2022.3233383.
    [18] 陈宇, 张勇, 陈实. 大规模卫星集群网络自适应加权分簇算法[J]. 北京理工大学学报, 2021, 41(11): 1188–1192. doi: 10.15918/j.tbit1001-0645.2021.072.

    CHEN Yu, ZHANG Yong, and CHEN Shi. Adaptive weighted clustering algorithm for large-scale satellite cluster network[J]. Transactions of Beijing Institute of Technology, 2021, 41(11): 1188–1192. doi: 10.15918/j.tbit1001-0645.2021.072.
    [19] 王瑞峰, 张明, 黄子恒, 等. 利用A2C-ac的城轨车车通信资源分配算法[J]. 电子与信息学报, 2024, 46(4): 1306–1313. doi: 10.11999/JEIT230623.

    WANG Ruifeng, ZHANG Ming, HUANG Ziheng, et al. Resource allocation algorithm of urban rail train-to-train communication with A2C-ac[J]. Journal of Electronics & Information Technology, 2024, 46(4): 1306–1313. doi: 10.11999/JEIT230623.
    [20] LI Tian, SAHU A K, ZAHEER M, et al. Federated optimization in heterogeneous networks[C]. The 3rd Conference on Machine Learning and Systems (MLSys 2020), Austin, USA, 2020: 303–313.
  • 加载中
图(11) / 表(3)
计量
  • 文章访问数:  319
  • HTML全文浏览量:  138
  • PDF下载量:  48
  • 被引次数: 0
出版历程
  • 收稿日期:  2025-02-12
  • 修回日期:  2025-07-17
  • 网络出版日期:  2025-07-26
  • 刊出日期:  2025-08-27

目录

    /

    返回文章
    返回