高级搜索

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

基于迁移深度强化学习的低轨卫星跳波束资源分配方案

陈前斌 麻世庆 段瑞吉 唐伦 梁承超

陈前斌, 麻世庆, 段瑞吉, 唐伦, 梁承超. 基于迁移深度强化学习的低轨卫星跳波束资源分配方案[J]. 电子与信息学报. doi: 10.11999/JEIT211457
引用本文: 陈前斌, 麻世庆, 段瑞吉, 唐伦, 梁承超. 基于迁移深度强化学习的低轨卫星跳波束资源分配方案[J]. 电子与信息学报. doi: 10.11999/JEIT211457
CHEN Qianbin, MA Shiqing, DUAN Ruiji, TANG Lun, LIANG Chengchao. A Novel Beam Hopping Resource Allocation Scheme of Low Earth Orbit Satellite Based on Transfer Deep Reinforcement Learning[J]. Journal of Electronics & Information Technology. doi: 10.11999/JEIT211457
Citation: CHEN Qianbin, MA Shiqing, DUAN Ruiji, TANG Lun, LIANG Chengchao. A Novel Beam Hopping Resource Allocation Scheme of Low Earth Orbit Satellite Based on Transfer Deep Reinforcement Learning[J]. Journal of Electronics & Information Technology. doi: 10.11999/JEIT211457

基于迁移深度强化学习的低轨卫星跳波束资源分配方案

doi: 10.11999/JEIT211457
基金项目: 国家自然科学基金(62071078, 62001076),重庆市教委科学技术研究项目(KJZD-M201800601, KJQN-201900645)
详细信息
    作者简介:

    陈前斌:男,教授,博士生导师,研究方向为空天地一体化、多媒体信息处理与传输、异构蜂窝网络等

    麻世庆:男,硕士生,研究方向为空天地一体化、星地融合、机器学习算法

    段瑞吉:男,硕士生,研究方向为空天地一体化、星地融合、凸优化算法

    唐伦:男,教授,博士,研究方向为空天地一体化、下一代无线通信网络、软件定义无线网络等

    梁承超:男,教授,博士,研究方向无线通信、空天地一体化网络、(卫星)互联网架构与协议

    通讯作者:

    梁承超 liangcc@cqupt.edu.cn

  • 中图分类号: TN927

A Novel Beam Hopping Resource Allocation Scheme of Low Earth Orbit Satellite Based on Transfer Deep Reinforcement Learning

Funds: The National Natural Science Foundation of China (62071078, 62001076), the Science and Technology Research Program of Chongqing Municipal Education Commission (KJZD-M201800601, KJQN-201900645)
  • 摘要: 针对低轨(LEO)卫星场景下,传统资源分配方案容易造成特定小区资源分配无法满足需求的问题,该文提出一种基于迁移深度强化学习(TDRL)的低轨卫星跳波束资源分配方案。首先,该方案联合星上缓冲信息、业务到达情况和信道状态,以最小化卫星上数据包平均时延为目标,建立支持跳波束技术的低轨卫星资源分配优化模型。其次,针对低轨卫星网络的动态多变性,该文考虑动态随机变化的通信资源和通信需求,采用深度Q网络(DQN)算法利用神经网络作为非线性近似函数。进一步,为实现并加速深度强化学习(DRL)算法在其他目标任务中的收敛过程,该文引入迁移学习(TL)概念,利用源卫星学习的调度任务快速寻找目标卫星的波束调度和功率分配策略。仿真结果表明,该文所提出的算法能够优化卫星服务过程中的时隙分配,减少数据包的平均传输时延,并有效提高系统的吞吐量和资源利用效率。
  • 图  1  基于跳波束的低轨卫星通信架构

    图  2  基于TL-DRL的LEO-BH系统架构图

    图  3  状态重构过程

    图  4  深度神经网络模型

    图  5  不同迁移率因子的TL-DQN的算法收敛性能

    图  6  小区需求规律变化下系统性能展示图

    图  7  系统性能与业务到达率关系图

    算法1 基于TL-DQN的低轨卫星跳波束方案
     (1)  初始化多波束低轨卫星网络参数、小区参数、经验回放池${\boldsymbol{D}}$及其容量$ N $
     (2)  随机初始化动作-价值函数$ Q $网络中的参数$ w $,初始化动作-价值函数$ {Q^ - } $网络中的参数$ {w^ - } $,并令权重$ {w^ - } = w $
     (3)  For 学习回合episode=1,2,···,Nepochs do
     (4)   通过公式$ \varepsilon = 1 - (0.5 + {n_{{\text{epochs}}}}/{N_{{\text{epochs}}}} \times 0.3) $初始化$ \varepsilon $,逐步减小探索概率
     (5)   初始化获取状态$ s({t_0}) $,本地策略$ \pi _{{t_0}}^{{\text{tg}}}(s({t_0}),a({t_0})) $和外来迁移策略$ \pi _{{t_0}}^{\text{s}}(s({t_0}),a({t_0})) $
     (6)   For time i=0,1,···, $ |{\mathbf{T}}| - 1 $ do
     (7)    随机生成概率$ p $
     (8)    If $p \le \varepsilon$:
     (9)     低轨卫星随机选取满足限制条件的$ a({t_i}) $
     (10)    Else
     (11)     由式(26)得到整体策略,根据$ \pi _{{t_0}}^{{\text{tg}}}(s({t_0}),a({t_0})) $选择动作$ a({t_i}) = \arg {\max _{a({t_i})}}Q(s({t_i}),a({t_i});\omega ) $,实现低轨卫星波束调度和资
           源分配,而后更新环境状态$ s({t_{i + 1}}) $,并立即得到奖励$ r({t_i}) $
     (12)     将4元组$ (s({t_i}),a({t_i}),r({t_i}),s({t_{i + 1}})) $存储在${\boldsymbol{D}}$中
     (13)     从${\boldsymbol{D}}$中随机抽取一小批量的样本$ (s({t_i}),a({t_i}),r({t_i}),s({t_{i + 1}})) $
     (14)     利用式(19)计算损失函数
     (15)     指数加权平均数的1阶矩和2阶矩可以分别通过式(21)和式(22)得到
     (16)     利用Adam算法,计算式(23)和式(24)分别得出1阶矩和2阶矩的偏差修正项
     (17)     利用式(25)更新估值网络$ Q $的权重参数$ w $
     (18)     每隔一定步数$ G $用$ Q $网络参数$ w $替换更新目标$ {Q^ - } $网络参数$ {w^ - } $
     (19)  End for
     (20) End for
    下载: 导出CSV

    表  1  低轨卫星场景设置参数

    低轨卫星网络参数取值低轨卫星网络参数取值
    卫星轨道高度h781 km用户的接收天线增益Gr20 dBi
    卫星波束个数K7噪声功率密度N0–174 dBm/Hz
    服务小区总数N49卫星最大发射功率Ptot20 dBW
    小区直径D667 km单波束的最大发射功率Pmax18 dBW
    信道总带宽Bw250 MHz多波束天线半波束角$ {\theta _\alpha } $$ {2^ \circ } $
    下行链路工作频率f20 GHz数据包大小M50 kbits
    卫星发射的最大天线增益Gm41.6 dBi业务数据包到达率$ {\lambda _{{c_n}}}(t) $[1,21]
    下载: 导出CSV

    表  2  TL-DQN算法参数设置

    TL-DQN算法参数取值TL-DQN算法参数取值
    训练周期Nepochs600Adam优化器中$ {\beta _1} $0.9
    每周期的时隙数|T|6000Adam优化器中$ {\beta _2} $0.999
    经验池容量5000随机失活比例0.2
    折扣因子$ \gamma $0.9目标$ {Q^ - } $网络的更新频率G100
    学习率$ \alpha $0.0001激活函数ReLU
    批量训练数目Nt10探索概率$ \varepsilon $(0.2,0.5)
    优化器算法Adam迁移率因子$ \eta ${0,0.2,0.5}
    下载: 导出CSV
  • [1] RADTKE J, KEBSCHULL C, and STOLL E. Interactions of the space debris environment with mega constellations—using the example of the OneWeb constellation[J]. Acta Astronautica, 2017, 131: 55–68. doi: 10.1016/j.actaastro.2016.11.021
    [2] NI Shuang, LIU Junyu, SHENG Min, et al. Joint optimization of user association and resource allocation in cache-enabled terrestrial-satellite integrating network[J]. Science China Information Sciences, 2021, 64(8): 182306. doi: 10.1007/s11432-020-3083-5
    [3] XIE Renchao, TANG Qinqin, WANG Qiuning, et al. Satellite-terrestrial integrated edge computing networks: Architecture, challenges, and open issues[J]. IEEE Network, 2020, 34(3): 224–231. doi: 10.1109/MNET.011.1900369
    [4] PANTHI S, BREYNAERT D, MCLAIN C, et al. Beam hopping-a flexible satellite communication system for mobility[C]. The 35th AIAA International Communications Satellite Systems Conference, Trieste, Italy, 2017: 16–19.
    [5] WANG Libing, HU Xin, MA Shijun, et al. Dynamic beam hopping of multi-beam satellite based on genetic algorithm[C]. 2020 IEEE Intl Conf on Parallel & Distributed Processing with Applications, Big Data & Cloud Computing, Sustainable Computing & Communications, Social Computing & Networking (ISPA/BDCloud/SocialCom/SustainCom), Exeter, UK, 2020: 1364–1370.
    [6] LEI Jiang, HAN Zhu, VÁZQUEZ-CASTRO M Á, et al. Secure satellite communication systems design with individual secrecy rate constraints[J]. IEEE Transactions on Information Forensics and Security, 2011, 6(3): 661–671. doi: 10.1109/TIFS.2011.2148716
    [7] HAN Han, ZHENG Xueqiang, HUANG Qinfei, et al. QoS-equilibrium slot allocation for beam hopping in broadband satellite communication systems[J]. Wireless Networks, 2015, 21(8): 2617–2630. doi: 10.1007/S11276-015-0934-Z
    [8] LIZARRAGA J, ANGELETTI P, ALAGHA N, et al. Flexibility performance in advanced Ka-band multibeam satellites[C]. 2014 IEEE International Vacuum Electronics Conference, Monterey, USA, 2014: 45–46.
    [9] ALEGRE R, ALAGHA N, and VÁZQUEZ-CASTRO M A. Heuristic algorithms for flexible resource allocation in beam hopping multi-beam satellite systems[C]. The 29th AIAA International Communications Satellite Systems Conference, Nara, Japan, 2011: 6–20.
    [10] SHI Shengchao, LI Guangxia, LI Zhiqiang, et al. Joint power and bandwidth allocation for beam-hopping user downlinks in smart gateway multibeam satellite systems[J]. International Journal of Distributed Sensor Networks, 2017, 13(5): 155014771770946.
    [11] LEI Lei, LAGUNAS E, YUAN Yaxiong, et al. Deep learning for beam hopping in multibeam satellite systems[C]. The 2020 IEEE 91st Vehicular Technology Conference, Antwerp, Belgium, 2020: 1–5.
    [12] LEI Lei, LAGUNAS E, YUAN Yaxiong, et al. Beam illumination pattern design in satellite networks: Learning and optimization for efficient beam hopping[J]. IEEE Access, 2020, 8: 136655–136667. doi: 10.1109/ACCESS.2020.3011746
    [13] International Telecommunication Union-Radio(ITU-R). Rec. ITU-R S. 1528 Satellite antenna radiation patterns for non-geostationary orbit satellite antennas operating in the fixed-satellite service below 30 GHz[S]. 2001.
    [14] 管令进. 基于深度强化学习的异构云无线接入网资源分配算法研究[D]. [硕士论文], 重庆邮电大学, 2020.

    GUAN Lingjin. Deep reinforcement learning-based resource allocation algorithm research for heterogeneous cloud access network[D]. [Master dissertation], Chongqing University of Posts and Telecommunications, 2020.
    [15] 王艺鹏. 多波束卫星通信系统中的动态波束调度技术研究[D]. [硕士论文], 北京邮电大学, 2019.

    WANG Yipeng. Research on dynamic beam scheduling technology in multi-beam satellite communication system[D]. [Master dissertation], Beijing University of Posts and Telecommunications, 2019.
    [16] JUSTESEN N, BONTRAGER P, TOGELIUS J, et al. Deep learning for video game playing[J]. IEEE Transactions on Games, 2020, 12(1): 1–20. doi: 10.1109/TG.2019.2896986
    [17] 陈前斌, 管令进, 李子煜, 等. 基于深度强化学习的异构云无线接入网自适应无线资源分配算法[J]. 电子与信息学报, 2020, 42(6): 1468–1477. doi: 10.11999/JEIT190511

    CHEN Qianbin, GUAN Lingjin, LI Ziyu, et al. Deep reinforcement learning-based adaptive wireless resource allocation algorithm for heterogeneous cloud wireless access network[J]. Journal of Electronics &Information Technology, 2020, 42(6): 1468–1477. doi: 10.11999/JEIT190511
    [18] ŞEN S Y and ÖZKURT N. Convolutional neural network hyperparameter tuning with adam optimizer for ECG classification[C]. 2020 Innovations in Intelligent Systems and Applications Conference, Istanbul, Turkey, 2020: 1–6.
    [19] KOUSHI A M, HU Fei, and KUMAR S. Intelligent spectrum management based on transfer actor-critic learning for rateless transmissions in cognitive radio networks[J]. IEEE Transactions on Mobile Computing, 2018, 17(5): 1204–1215. doi: 10.1109/TMC.2017.2744620
    [20] 唐伦, 贺小雨, 王晓, 等. 基于迁移演员-评论家学习的服务功能链部署算法[J]. 电子与信息学报, 2020, 42(11): 2671–2679. doi: 10.11999/JEIT190542

    TANG Lun, HE Xiaoyu, WANG Xiao, et al. Deployment algorithm of service function chain based on transfer actor-critic learning[J]. Journal of Electronics &Information Technology, 2020, 42(11): 2671–2679. doi: 10.11999/JEIT190542
    [21] PRATT S R, RAINES R A, FOSSA C E, et al. An operational and performance overview of the IRIDIUM low earth orbit satellite system[J]. IEEE Communications Surveys, 1999, 2(2): 2–10. doi: 10.1109/COMST.1999.5340513
    [22] HU Xin, ZHANG Yuchen, LIAO Xianglai, et al. Dynamic beam hopping method based on multi-objective deep reinforcement learning for next generation satellite broadband systems[J]. IEEE Transactions on Broadcasting, 2020, 66(3): 630–646. doi: 10.1109/TBC.2019.2960940
  • 加载中
图(7) / 表(3)
计量
  • 文章访问数:  655
  • HTML全文浏览量:  309
  • PDF下载量:  138
  • 被引次数: 0
出版历程
  • 收稿日期:  2021-12-08
  • 修回日期:  2022-03-23
  • 网络出版日期:  2022-03-29

目录

    /

    返回文章
    返回