基于迁移深度强化学习的低轨卫星跳波束资源分配方案

陈前斌; 麻世庆; 段瑞吉; 唐伦; 梁承超

doi:10.11999/JEIT211457

基于迁移深度强化学习的低轨卫星跳波束资源分配方案

doi: 10.11999/JEIT211457

重庆邮电大学通信与信息工程学院重庆 400065

基金项目: 国家自然科学基金(62071078, 62001076)，重庆市教委科学技术研究项目(KJZD-M201800601, KJQN-201900645)

详细信息

作者简介:
陈前斌：男，教授，博士生导师，研究方向为空天地一体化、多媒体信息处理与传输、异构蜂窝网络等

麻世庆：男，硕士生，研究方向为空天地一体化、星地融合、机器学习算法

段瑞吉：男，硕士生，研究方向为空天地一体化、星地融合、凸优化算法

唐伦：男，教授，博士，研究方向为空天地一体化、下一代无线通信网络、软件定义无线网络等

梁承超：男，教授，博士，研究方向无线通信、空天地一体化网络、(卫星)互联网架构与协议

通讯作者:
梁承超　liangcc@cqupt.edu.cn

中图分类号: TN927
计量
- 文章访问数: 2181
- HTML全文浏览量: 1172
- PDF下载量: 438
- 被引次数: 0
出版历程
- 收稿日期: 2021-12-08
- 修回日期: 2022-03-23
- 网络出版日期: 2022-03-29
- 刊出日期: 2023-02-07

A Novel Beam Hopping Resource Allocation Scheme of Low Earth Orbit Satellite Based on Transfer Deep Reinforcement Learning

School of Communication and Information Engineering, Chongqing University of Posts and Telecommunications, Chongqing 400065, China

Funds: The National Natural Science Foundation of China (62071078, 62001076), the Science and Technology Research Program of Chongqing Municipal Education Commission (KJZD-M201800601, KJQN-201900645)

摘要

摘要: 针对低轨(LEO)卫星场景下，传统资源分配方案容易造成特定小区资源分配无法满足需求的问题，该文提出一种基于迁移深度强化学习(TDRL)的低轨卫星跳波束资源分配方案。首先，该方案联合星上缓冲信息、业务到达情况和信道状态，以最小化卫星上数据包平均时延为目标，建立支持跳波束技术的低轨卫星资源分配优化模型。其次，针对低轨卫星网络的动态多变性，该文考虑动态随机变化的通信资源和通信需求，采用深度Q网络(DQN)算法利用神经网络作为非线性近似函数。进一步，为实现并加速深度强化学习(DRL)算法在其他目标任务中的收敛过程，该文引入迁移学习(TL)概念，利用源卫星学习的调度任务快速寻找目标卫星的波束调度和功率分配策略。仿真结果表明，该文所提出的算法能够优化卫星服务过程中的时隙分配，减少数据包的平均传输时延，并有效提高系统的吞吐量和资源利用效率。
- 低轨卫星网络 /
- 跳波束 /
- 资源分配 /
- 深度强化学习 /
- 迁移学习
Abstract: In the Low Earth Orbit (LEO) scenario, traditional resource allocation schemes can cause unbalanced resource allocation in specific cells. A beam hopping resource allocation scheme of LEO based on Transfer Deep Reinforcement Learning (TDRL) is proposed in this paper. Firstly, considering on-board buffer information, service arrival status and channel status, a LEO resource allocation optimization model that supports beam hopping technology is proposed with the goal of minimizing the average delay of data packets. Secondly, in view of the dynamic variability of the LEO network, the dynamic and random change of communication resources and requirements are considered, then the Deep Q Network (DQN) algorithm is adopted, and its neural network is used as a nonlinear approximation function. Further, to realize and accelerate the convergence process of the Deep Reinforcement Learning (DRL) algorithm in other target tasks, the concept of Transfer Learning (TL) is introduced in this paper, which uses the scheduling task learned by the source satellite to find quickly the beam scheduling and power allocation strategy of the target satellite. The simulation results demonstrate that the algorithm can optimize the time slot allocation in the satellite service process while decreasing the average delay of data packets and improving the throughput and resource utilization efficiency of the system.
- Low Earth Orbit (LEO) /
- Beam hopping /
- Resource allocation /
- Deep Reinforcement Learning(DRL) /
- Transfer Learning(TL)

HTML全文

图 1 基于跳波束的低轨卫星通信架构

下载: 全尺寸图片幻灯片

图 2 基于TL-DRL的LEO-BH系统架构图

下载: 全尺寸图片幻灯片

图 3 状态重构过程

下载: 全尺寸图片幻灯片

图 4 深度神经网络模型

下载: 全尺寸图片幻灯片

图 5 不同迁移率因子的TL-DQN的算法收敛性能

下载: 全尺寸图片幻灯片

图 6 小区需求规律变化下系统性能展示图

下载: 全尺寸图片幻灯片

图 7 系统性能与业务到达率关系图

下载: 全尺寸图片幻灯片

算法1　基于TL-DQN的低轨卫星跳波束方案
(1) 　初始化多波束低轨卫星网络参数、小区参数、经验回放池 ${\boldsymbol{D}}$ 及其容量 $N$
(2) 　随机初始化动作-价值函数 $Q$ 网络中的参数 $w$ ，初始化动作-价值函数 ${Q^ - }$ 网络中的参数 ${w^ - }$ ，并令权重 ${w^ - } = w$
(3) 　For 学习回合episode=1,2,···,N_epochs do
(4) 　　通过公式 $\varepsilon = 1 - (0.5 + {n_{{\text{epochs}}}}/{N_{{\text{epochs}}}} \times 0.3)$ 初始化 $\varepsilon$ ，逐步减小探索概率
(5) 　　初始化获取状态 $s({t_0})$ ，本地策略 $\pi _{{t_0}}^{{\text{tg}}}(s({t_0}),a({t_0}))$ 和外来迁移策略 $\pi _{{t_0}}^{\text{s}}(s({t_0}),a({t_0}))$
(6) 　　For time i=0,1,···, $\|{\mathbf{T}}\| - 1$ do
(7) 　　　随机生成概率 $p$
(8) 　　　If $p \le \varepsilon$ ：
(9) 　　　　低轨卫星随机选取满足限制条件的 $a({t_i})$
(10) 　　　Else
(11) 　　　　由式(26)得到整体策略，根据 $\pi _{{t_0}}^{{\text{tg}}}(s({t_0}),a({t_0}))$ 选择动作 $a({t_i}) = \arg {\max _{a({t_i})}}Q(s({t_i}),a({t_i});\omega )$ ，实现低轨卫星波束调度和资　　　　　　　源分配，而后更新环境状态 $s({t_{i + 1}})$ ，并立即得到奖励 $r({t_i})$
(12) 　　　　将4元组 $(s({t_i}),a({t_i}),r({t_i}),s({t_{i + 1}}))$ 存储在 ${\boldsymbol{D}}$ 中
(13) 　　　　从 ${\boldsymbol{D}}$ 中随机抽取一小批量的样本 $(s({t_i}),a({t_i}),r({t_i}),s({t_{i + 1}}))$
(14) 　　　　利用式(19)计算损失函数
(15) 　　　　指数加权平均数的1阶矩和2阶矩可以分别通过式(21)和式(22)得到
(16) 　　　　利用Adam算法，计算式(23)和式(24)分别得出1阶矩和2阶矩的偏差修正项
(17) 　　　　利用式(25)更新估值网络 $Q$ 的权重参数 $w$
(18) 　　　　每隔一定步数 $G$ 用 $Q$ 网络参数 $w$ 替换更新目标 ${Q^ - }$ 网络参数 ${w^ - }$
(19) 　End for
(20) End for

下载: 导出CSV

表 1 低轨卫星场景设置参数

低轨卫星网络参数	取值	低轨卫星网络参数	取值
卫星轨道高度h	781 km	用户的接收天线增益G_r	20 dBi
卫星波束个数K	7	噪声功率密度N₀	–174 dBm/Hz
服务小区总数N	49	卫星最大发射功率P_tot	20 dBW
小区直径D	667 km	单波束的最大发射功率P^max	18 dBW
信道总带宽B_w	250 MHz	多波束天线半波束角 ${\theta _\alpha }$	${2^ \circ }$
下行链路工作频率f	20 GHz	数据包大小M	50 kbit
卫星发射的最大天线增益G_m	41.6 dBi	业务数据包到达率 ${\lambda _{{c_n}}}(t)$	[1,21]

下载: 导出CSV

表 2 TL-DQN算法参数设置

TL-DQN算法参数	取值	TL-DQN算法参数	取值
训练周期N_epochs	600	Adam优化器中 ${\beta _1}$	0.9
每周期的时隙数\|T\|	6000	Adam优化器中 ${\beta _2}$	0.999
经验池容量	5000	随机失活比例	0.2
折扣因子 $\gamma$	0.9	目标 ${Q^ - }$ 网络的更新频率G	100
学习率 $\alpha$	0.0001	激活函数	ReLU
批量训练数目N_t	10	探索概率 $\varepsilon$	(0.2,0.5)
优化器算法	Adam	迁移率因子 $\eta$	{0,0.2,0.5}

下载: 导出CSV

参考文献(22)

[1]	RADTKE J, KEBSCHULL C, and STOLL E. Interactions of the space debris environment with mega constellations—using the example of the OneWeb constellation[J]. Acta Astronautica, 2017, 131: 55–68. doi: 10.1016/j.actaastro.2016.11.021
[2]	NI Shuang, LIU Junyu, SHENG Min, et al. Joint optimization of user association and resource allocation in cache-enabled terrestrial-satellite integrating network[J]. Science China Information Sciences, 2021, 64(8): 182306. doi: 10.1007/s11432-020-3083-5
[3]	XIE Renchao, TANG Qinqin, WANG Qiuning, et al. Satellite-terrestrial integrated edge computing networks: Architecture, challenges, and open issues[J]. IEEE Network, 2020, 34(3): 224–231. doi: 10.1109/MNET.011.1900369
[4]	PANTHI S, BREYNAERT D, MCLAIN C, et al. Beam hopping-a flexible satellite communication system for mobility[C]. The 35th AIAA International Communications Satellite Systems Conference, Trieste, Italy, 2017: 16–19.
[5]	WANG Libing, HU Xin, MA Shijun, et al. Dynamic beam hopping of multi-beam satellite based on genetic algorithm[C]. 2020 IEEE Intl Conf on Parallel & Distributed Processing with Applications, Big Data & Cloud Computing, Sustainable Computing & Communications, Social Computing & Networking (ISPA/BDCloud/SocialCom/SustainCom), Exeter, UK, 2020: 1364–1370.
[6]	LEI Jiang, HAN Zhu, VÁZQUEZ-CASTRO M Á, et al. Secure satellite communication systems design with individual secrecy rate constraints[J]. IEEE Transactions on Information Forensics and Security, 2011, 6(3): 661–671. doi: 10.1109/TIFS.2011.2148716
[7]	HAN Han, ZHENG Xueqiang, HUANG Qinfei, et al. QoS-equilibrium slot allocation for beam hopping in broadband satellite communication systems[J]. Wireless Networks, 2015, 21(8): 2617–2630. doi: 10.1007/S11276-015-0934-Z
[8]	LIZARRAGA J, ANGELETTI P, ALAGHA N, et al. Flexibility performance in advanced Ka-band multibeam satellites[C]. 2014 IEEE International Vacuum Electronics Conference, Monterey, USA, 2014: 45–46.
[9]	ALEGRE R, ALAGHA N, and VÁZQUEZ-CASTRO M A. Heuristic algorithms for flexible resource allocation in beam hopping multi-beam satellite systems[C]. The 29th AIAA International Communications Satellite Systems Conference, Nara, Japan, 2011: 6–20.
[10]	SHI Shengchao, LI Guangxia, LI Zhiqiang, et al. Joint power and bandwidth allocation for beam-hopping user downlinks in smart gateway multibeam satellite systems[J]. International Journal of Distributed Sensor Networks, 2017, 13(5): 155014771770946.
[11]	LEI Lei, LAGUNAS E, YUAN Yaxiong, et al. Deep learning for beam hopping in multibeam satellite systems[C]. The 2020 IEEE 91st Vehicular Technology Conference, Antwerp, Belgium, 2020: 1–5.
[12]	LEI Lei, LAGUNAS E, YUAN Yaxiong, et al. Beam illumination pattern design in satellite networks: Learning and optimization for efficient beam hopping[J]. IEEE Access, 2020, 8: 136655–136667. doi: 10.1109/ACCESS.2020.3011746
[13]	International Telecommunication Union-Radio(ITU-R). Rec. ITU-R S. 1528 Satellite antenna radiation patterns for non-geostationary orbit satellite antennas operating in the fixed-satellite service below 30 GHz[S]. 2001.
[14]	管令进. 基于深度强化学习的异构云无线接入网资源分配算法研究[D]. [硕士论文], 重庆邮电大学, 2020. GUAN Lingjin. Deep reinforcement learning-based resource allocation algorithm research for heterogeneous cloud access network[D]. [Master dissertation], Chongqing University of Posts and Telecommunications, 2020.
[15]	王艺鹏. 多波束卫星通信系统中的动态波束调度技术研究[D]. [硕士论文], 北京邮电大学, 2019. WANG Yipeng. Research on dynamic beam scheduling technology in multi-beam satellite communication system[D]. [Master dissertation], Beijing University of Posts and Telecommunications, 2019.
[16]	JUSTESEN N, BONTRAGER P, TOGELIUS J, et al. Deep learning for video game playing[J]. IEEE Transactions on Games, 2020, 12(1): 1–20. doi: 10.1109/TG.2019.2896986
[17]	陈前斌, 管令进, 李子煜, 等. 基于深度强化学习的异构云无线接入网自适应无线资源分配算法[J]. 电子与信息学报, 2020, 42(6): 1468–1477. doi: 10.11999/JEIT190511 CHEN Qianbin, GUAN Lingjin, LI Ziyu, et al. Deep reinforcement learning-based adaptive wireless resource allocation algorithm for heterogeneous cloud wireless access network[J]. Journal of Electronics &Information Technology, 2020, 42(6): 1468–1477. doi: 10.11999/JEIT190511
[18]	ŞEN S Y and ÖZKURT N. Convolutional neural network hyperparameter tuning with adam optimizer for ECG classification[C]. 2020 Innovations in Intelligent Systems and Applications Conference, Istanbul, Turkey, 2020: 1–6.
[19]	KOUSHI A M, HU Fei, and KUMAR S. Intelligent spectrum management based on transfer actor-critic learning for rateless transmissions in cognitive radio networks[J]. IEEE Transactions on Mobile Computing, 2018, 17(5): 1204–1215. doi: 10.1109/TMC.2017.2744620
[20]	唐伦, 贺小雨, 王晓, 等. 基于迁移演员-评论家学习的服务功能链部署算法[J]. 电子与信息学报, 2020, 42(11): 2671–2679. doi: 10.11999/JEIT190542 TANG Lun, HE Xiaoyu, WANG Xiao, et al. Deployment algorithm of service function chain based on transfer actor-critic learning[J]. Journal of Electronics &Information Technology, 2020, 42(11): 2671–2679. doi: 10.11999/JEIT190542
[21]	PRATT S R, RAINES R A, FOSSA C E, et al. An operational and performance overview of the IRIDIUM low earth orbit satellite system[J]. IEEE Communications Surveys, 1999, 2(2): 2–10. doi: 10.1109/COMST.1999.5340513
[22]	HU Xin, ZHANG Yuchen, LIAO Xianglai, et al. Dynamic beam hopping method based on multi-objective deep reinforcement learning for next generation satellite broadband systems[J]. IEEE Transactions on Broadcasting, 2020, 66(3): 630–646. doi: 10.1109/TBC.2019.2960940