基于深度强化学习的空天地一体化网络资源分配算法

刘雪芳; 毛伟灏; 杨清海

doi:10.11999/JEIT231016

基于深度强化学习的空天地一体化网络资源分配算法

doi: 10.11999/JEIT231016

西安电子科技大学通信工程学院西安 710071

基金项目: 国家重点研发计划(2020YFB1807700)

详细信息

作者简介:
刘雪芳：女，副教授，硕士生导师，研究方向为人工智能通信、空天地一体化网络

毛伟灏：男，硕士生，研究方向为空天地一体化网络的资源分配

杨清海：男，教授，博士生导师，研究方向为自主通信网络、信息/网络融合、实时机器学习

通讯作者:
毛伟灏　maowh@stu.xidian.edu.cn

中图分类号: TN929.5
计量
- 文章访问数: 1086
- HTML全文浏览量: 491
- PDF下载量: 209
- 被引次数: 0
出版历程
- 收稿日期: 2023-09-18
- 修回日期: 2024-01-19
- 网络出版日期: 2024-01-31
- 刊出日期: 2024-07-29

A Resource Allocation Algorithm for Space-Air-Ground Integrated Network Based on Deep Reinforcement Learning

School of Telecommunications Engineering, Xidian University, Xi’an 710071, China

Funds: The National Key Research and Development Program of China (2020YFB1807700)

摘要

摘要: 空天地一体化网络(SAGIN)通过提高地面网络的资源利用率可以有效满足多种业务类型的通信需求，然而忽略了系统的自适应能力和鲁棒性及不同用户的服务质量(QoS)。针对这一问题，该文提出在空天地一体化网络架构下，面向城区和郊区通信的深度强化学习(DRL)资源分配算法。基于第3代合作伙伴计划(3GPP)标准中定义的用户参考信号接收功率(RSRP)，考虑地面同频干扰情况，以不同域中基站的时频资源作为约束条件，构建了最大化系统用户的下行吞吐量优化问题。利用深度Q网络(DQN)算法求解该优化问题时，定义了能够综合考虑用户服务质量需求、系统自适应能力及系统鲁棒性的奖励函数。仿真结果表明，综合考虑无人驾驶汽车，沉浸式服务及普通移动终端通信业务需求时，表征系统性能的奖励函数值在2 000次迭代下，相较于贪婪算法提升了39.1%；对于无人驾驶汽车业务，利用DQN算法进行资源分配后，相比于贪婪算法，丢包数平均下降38.07%，时延下降了6.05%。
- 空天地一体化网络 /
- 资源分配算法 /
- 深度强化学习 /
- 深度Q网络
Abstract: The Space-Air-Ground Integrated Network (SAGIN) can effectively meet the communication needs of various service types by improving the resource utilization of the ground network, but ignoring the adaptive ability and robustness of the system and the Quality of Service (QoS) in different users. In response to this problem, a Deep Reinforcement Learning (DRL) Resource allocation algorithm for urban and suburban communications under the SAGIN architecture is proposed in this paper. Based on Reference Signal Reception Power (RSRP) defined in the 3rd Generation Partnership Project (3GPP) standard, considering ground co-frequency interference, and using the time-frequency resources of base stations in different domains as constraints, an optimization problem to maxmize the downlink throughput of system users is constructed. When using the Deep Q-network (DQN) algorithm to solve the optimization problem, a reward function which can comprehensively consider the user’s QoS requirements, system adaptability and system robustness is defined. Considering the service requirements of unmanned vehicles, immersive services and ordinary mobile communication services, the simulation results show that the value of the reward function which represents the performance of the system is increased by 39.1% compared with the greedy algorithm under 2 000 iterations. For the unmanned vehicle services, the average packet loss rate by the DQN algorithm is 38.07% lower than that by the greedy algorithm, and the delay by the DQN algorithm is also 6.05% lower than that by the greedy algorithm.
- Space-Air-Ground Integrated Network (SAGIN) /
- Resource allocation /
- Deep Reinforcement Learning (DRL) /
- Deep Q-Network (DQN)

HTML全文

图 1 SAGIN架构

下载: 全尺寸图片幻灯片

图 2 同频干扰下的用户服务情况

下载: 全尺寸图片幻灯片

图 3 基于深度强化学习算法DQN的资源分配算法流程框图

下载: 全尺寸图片幻灯片

图 4 不同算法的系统奖励对比

下载: 全尺寸图片幻灯片

图 5 基站和用户迭代2000次后的地理位置

下载: 全尺寸图片幻灯片

图 6 迭代2000次后不同算法下基站传输速率

下载: 全尺寸图片幻灯片

图 7 不同算法下将同频干扰消除后的系统奖励对比

下载: 全尺寸图片幻灯片

图 8 不同算法下无人驾驶汽车的丢包率

下载: 全尺寸图片幻灯片

图 9 不同算法下无人驾驶汽车的时延

下载: 全尺寸图片幻灯片

1 SAGIN下DQN资源分配算法流程

输入：初始化经验回放池D，容量为N，估计网络 $Q$ 随机参数 $\theta$ ，　目标网络 ${Q'}$ 参数为 ${\theta '}$ ， ${\theta '} = \theta$ ，折扣因子 $\gamma$
输出：输出动作向量 ${{\boldsymbol{a}}_t}$
for episode $= 1,{\text{ }}M{\text{ do}}:$
初始化环境状态向量 ${{\boldsymbol{s}}_t}$
${\text{for }}t = 1,{\text{ }}T{\text{ do}}:$
以 $\varepsilon$ 为概率选择随机动作 ${{\boldsymbol{a}}_t}$ ，否则 $1 - \varepsilon$ 概率选择动作　　　 ${{\boldsymbol{a}}_t} = \arg {\max _a}Q({{\boldsymbol{s}}_t},{{\boldsymbol{a}}_{t,\theta}} )$
执行动作 ${{\boldsymbol{a}}_t}$ ，到达状态值 ${{\boldsymbol{s}}_{t + 1}}$ ，得到奖励值 ${r_t}$
将 $({{\boldsymbol{s}}_t},{{\boldsymbol{a}}_t},{r_t},{{\boldsymbol{s}}_{t + 1}})$ 存放在经验池 $D$ 中
从经验池 $D$ 中对向量进行均匀随机抽样，取出Mini-batch大　　　小的数据 $({{\boldsymbol{s}}_{{t'}}},{{\boldsymbol{a}}_{{t'}}},{r_{{t'}}},{{\boldsymbol{s}}_{{t'} + 1}})$
设置 ${y}_{{t}^{{'}}}=\left\{\begin{array}{l}\text{}{r}_{{t}^{{'}}},\text{}至{t}^{{'}}+1结束\\ {r}_{{t}^{{'}}}+\gamma {\mathrm{max}}_{a}{Q}^{{'}}({{\boldsymbol{s}}}_{{t}^{{'}}+1},{{\boldsymbol{a}}}_{{t;\theta}^{{'}}}{ }^{{'}}),\text{}未至t+1\end{array}\right.$
根据梯度下降法，利用损失函数　　　 $L(\theta ) = {({y_{{t'}}} - Q\left( {{{\boldsymbol{s}}_{{t'}}},{{\boldsymbol{a}}_{{t'}}};\theta } \right))^2}$ ，更新网络参数
更新网络 ${Q'} = Q$
end for
end for

下载: 导出CSV

表 1 SAGIN资源分配仿真主要参数

参数	数值
卫星载频 ${f_{\text{c}}}$ (GHz)	28.4
卫星带宽 ${B_{\text{w}}}$ (MHz)	220
卫星有效各向辐射功率 ${\text{EIPR}}$ (dBW)	62
卫星路径损耗 ${\text{PL}}$ (dB)	188.4
卫星大气损耗 ${\text{AL}}$ (dB)	0.1
卫星 $G/T$ (dB/K)	9.7
无人机载频 ${f_{\text{c}}}$ (MHz)	1000
无人机带宽 ${B_{\text{w}}}$ (MHz)	30
无人机天线增益 $G$ (dBi)	16
无人机发射器天线高度 ${h_{\text{b}}}$ (m)	50
无人机副载波频率 $P$ (dB)	20
无人机馈电损耗FL(dB)	4
地面基站载频 ${f_{\text{c}}}$ (MHz)	1700
地面基站天线增益 $G$ (dBi)	5
地面基站副载波频率 $P$ (dB)	20
地面基站馈电损耗FL(dB)	1
地面基站发射器天线高度 ${h_{\text{b}}}$ (m)	40
用户接收器天线高度 ${h_{\text{m}}}$ (m)	1

下载: 导出CSV

表 2 DQN算法参数

参数	数值
${t_{{\text{duration}}}}$ (s)	20
${t_{{\text{sample}}}}$ (s)	0.01
episodes	2 000
ITER	2 000
学习率	1e–3
折扣因子 $\gamma$	0.95
batch size	100
memory size	5e5

下载: 导出CSV

表 3 SAGIN资源分配仿真用户分类

业务名称	标号	用户速度(m/s)	业务特点	地理位置	下行速率需求(Mbit/s)	服务等级 $\alpha$
沉浸式服务(如AR，高清视频等)	${\text{U}}{{\text{E}}^{\text{0}}}{\text{,U}}{{\text{E}}^{\text{1}}}{\text{,U}}{{\text{E}}^{\text{2}}}$	0	高带宽，固定	城镇	15	1
无人驾驶汽车	${\text{U}}{{\text{E}}^{\text{3}}}$	20	极高带宽，高移动性	城镇	25	3
普通移动终端通信	${\text{U}}{{\text{E}}^{\text{4}}}{\text{,U}}{{\text{E}}^{\text{5}}}$	1.2	低带宽、低移动性	郊区	3	1

下载: 导出CSV

表 4 不同算法下基站最大资源分配用户

基站序号	${\text{B}}{{\text{S}}^{\text{0}}}$	${\text{B}}{{\text{S}}^{\text{1}}}$	${\text{B}}{{\text{S}}^{\text{2}}}$	${\text{B}}{{\text{S}}^{\text{3}}}$
DQN算法	${\text{U}}{{\text{E}}^{\text{5}}}$	${\text{U}}{{\text{E}}^{\text{4}}}$	${\text{U}}{{\text{E}}^{\text{1}}}{\text{,U}}{{\text{E}}^{\text{3}}}$	${\text{U}}{{\text{E}}^{\text{0}}}{\text{,U}}{{\text{E}}^{\text{2}}}$
贪婪算法	${\text{U}}{{\text{E}}^{\text{4}}}{\text{,U}}{{\text{E}}^{\text{5}}}$	$-$	${\text{U}}{{\text{E}}^{\text{1}}}{\text{,U}}{{\text{E}}^{\text{2}}}{\text{,U}}{{\text{E}}^{\text{3}}}$	${\text{U}}{{\text{E}}^{\text{0}}}$
随机算法	${\text{U}}{{\text{E}}^{\text{0}}}{\text{,U}}{{\text{E}}^{\text{5}}}$	${\text{U}}{{\text{E}}^{\text{4}}}$	${\text{U}}{{\text{E}}^{\text{3}}}$	${\text{U}}{{\text{E}}^{\text{1}}}{\text{,U}}{{\text{E}}^{\text{2}}}$

下载: 导出CSV

表 5 不同算法下加入同频干扰后下降的系统奖励值

算法名称	下降的系统奖励值R
DQN算法	289.97
贪婪算法	455.16
随机算法	967.49

下载: 导出CSV

参考文献(46)

[1]	钱志鸿, 田春生, 郭银景, 等. 智能网联交通系统的关键技术与发展[J]. 电子与信息学报, 2020, 42(1): 2–19. doi: 10.11999/JEIT190787. QIAN Zhihong, TIAN Chunsheng, GUO Yinjing, et al. The key technology and development of intelligent and connected transportation system[J]. Journal of Electronics & Information Technology, 2020, 42(1): 2–19. doi: 10.11999/JEIT190787.
[2]	QIU Chao, CHEN Zheyuan, REN Xiaoxu, et al. AImers-6G: AI-driven region-temporal resource provisioning for 6G immersive services[J]. IEEE Wireless Communications, 2023, 30(3): 196–203. doi: 10.1109/MWC.022.2200539.
[3]	CASONI M, GRAZIA C A, KLAPEZ M, et al. Integration of satellite and LTE for disaster recovery[J]. IEEE Communications Magazine, 2015, 53(3): 47–53. doi: 10.1109/MCOM.2015.7060481.
[4]	DING Xiang, WANG Xiaoqing, DOU Aixia, et al. The development of rapid earthquake disaster assessment system based on space-air-ground integrated earth observation[C]. 2021 IEEE International Geoscience and Remote Sensing Symposium IGARSS, Brussels, Belgium, 2021: 8456–8459. doi: 10.1109/IGARSS47720.2021.9553806.
[5]	张晓凯, 郭道省, 张邦宁. 空天地一体化网络研究现状与新技术的应用展望[J]. 天地一体化信息网络, 2021, 2(4): 19–26. doi: 10.11959/j.issn.2096-8930.2021039. ZHANG Xiaokai, GUO Daoxing, and ZHANG Bangning. Research status of space-air-ground integrated network and application prospects of new technologies[J]. Space-Integrated-Ground Information Networks, 2021, 2(4): 19–26. doi: 10.11959/j.issn.2096-8930.2021039.
[6]	DAI Cuiqin, LUO Junfeng, FU Shu, et al. Dynamic user association for resilient backhauling in satellite–terrestrial integrated networks[J]. IEEE Systems Journal, 2020, 14(4): 5025–5036. doi: 10.1109/jsyst.2020.2980314.
[7]	FERRÚS R, KOUMARAS H, SALLENT O, et al. SDN/NFV-enabled satellite communications networks: Opportunities, scenarios and challenges[J]. Physical Communication, 2016, 18: 95–112. doi: 10.1016/j.phycom.2015.10.007.
[8]	CHENG Nan, HE Jingchao, YIN Zhisheng, et al. 6G service-oriented space-air-ground integrated network: A survey[J]. Chinese Journal of Aeronautics, 2022, 35(9): 1–18. doi: 10.1016/j.cja.2021.12.013.
[9]	JIANG Weiwei. Software defined satellite networks: A survey[J]. Digital Communications and Networks, 2023, 9(6): 1243–1264. doi: 10.1016/j.dcan.2023.01.016.
[10]	LI Junling, SHI Weisen, WU Huaqing, et al. Cost-aware dynamic SFC mapping and scheduling in SDN/NFV-enabled space-air-ground-integrated networks for internet of vehicles[J]. IEEE Internet of Things Journal, 2022, 9(8): 5824–5838. doi: 10.1109/JIOT.2021.3058250.
[11]	GAO Xiangqiang, LIU Rongke, and KAUSHIK A. Service chaining placement based on satellite mission planning in ground station networks[J]. IEEE Transactions on Network and Service Management, 2021, 18(3): 3049–3063. doi: 10.1109/tnsm.2020.3045432.
[12]	YANG Dan, LIU Jiang, ZHANG Ran, et al. Multi-constraint virtual network embedding algorithm for satellite networks[C]. GLOBECOM 2020 - 2020 IEEE Global Communications Conference, Taipei, China, 2020: 1–6. doi: 10.1109/globecom42002.2020.9347993.
[13]	WANG Guangchao, ZHOU Sheng, ZHANG Shan, et al. SFC-based service provisioning for reconfigurable space-air-ground integrated networks[J]. IEEE Journal on Selected Areas in Communications, 2020, 38(7): 1478–1489. doi: 10.1109/JSAC.2020.2986851.
[14]	ALSHAROA A and ALOUINI M S. Improvement of the global connectivity using integrated satellite-airborne-terrestrial networks with resource optimization[J]. IEEE Transactions on Wireless Communications, 2020, 19(8): 5088–5100. doi: 10.1109/TWC.2020.2988917.
[15]	华道本. 基于5G的低轨道卫星通信系统传输技术研究[D]. [硕士论文], 东南大学, 2019. doi: 10.27014/d.cnki.gdnau.2019.002673. HUA Daoben. Research on transmission technology of low earth orbit satellite communication system based on 5G[D]. [Master dissertation], Southeast University, 2019. doi: 10.27014/d.cnki.gdnau.2019.002673.
[16]	倪爽. 星地一体化网络接入与存储资源协同管控技术研究[D]. [硕士论文], 西安电子科技大学, 2021. doi: 10.27389/d.cnki.gxadu.2021.001600. NI Shuang. Coordinated access and cache resource management technology in terrestrial-satellite integrated network[D]. [Master dissertation], Xidian University, 2021. doi: 10.27389/d.cnki.gxadu.2021.001600.
[17]	陈新颖, 盛敏, 李博, 等. 面向6G的无人机通信综述[J]. 电子与信息学报, 2022, 44(3): 781–789. doi: 10.11999/JEIT210789. CHEN Xinying, SHENG Min, LI Bo, et al. Survey on unmanned aerial vehicle communications for 6G[J]. Journal of Electronics & Information Technology, 2022, 44(3): 781–789. doi: 10.11999/JEIT210789.
[18]	LI Qi, CAO Zehong, ZHONG Jiang, et al. Graph representation learning with encoding edges[J]. Neurocomputing, 2019, 361: 29–39. doi: 10.1016/j.neucom.2019.07.076.
[19]	LIU Jianhua, WANG Xin, SHEN Shigen, et al. A bayesian Q-learning game for dependable task offloading against DDoS attacks in sensor edge cloud[J]. IEEE Internet of Things Journal, 2021, 8(9): 7546–7561. doi: 10.1109/JIOT.2020.3038554.
[20]	LIU Jianhua, WANG Xin, SHEN Shigen, et al. Intelligent jamming defense using DNN stackelberg game in sensor edge cloud[J]. IEEE Internet of Things Journal, 2022, 9(6): 4356–4370. doi: 10.1109/JIOT.2021.3103196.
[21]	ZHANG Peiying, ZHANG Yi, KUMAR N, et al. Deep reinforcement learning algorithm for latency-oriented IIoT resource orchestration[J]. IEEE Internet of Things Journal, 2023, 10(8): 7153–7163. doi: 10.1109/JIOT.2022.3229270.
[22]	WANG Chao, LIU Lei, JIANG Chunxiao, et al. Incorporating distributed DRL into storage resource optimization of space-air-ground integrated wireless communication network[J]. IEEE Journal of Selected Topics in Signal Processing, 2022, 16(3): 434–446. doi: 10.1109/JSTSP.2021.3136027.
[23]	JIANG Fan, ZHANG Lan, SUN Changyin, et al. Clustering and resource allocation strategy for D2D multicast networks with machine learning approaches[J]. China Communications, 2021, 18(1): 196–211. doi: 10.23919/jcc.2021.01.017.
[24]	QIU Chao, YU F R, YAO Haipeng, et al. Blockchain-based software-defined industrial internet of things: A dueling deep Q-learning approach[J]. IEEE Internet of Things Journal, 2019, 6(3): 4627–4639. doi: 10.1109/jiot.2018.2871394.
[25]	ZHANG Peiying, WANG Chao, JIANG Chunxiao, et al. Deep reinforcement learning assisted federated learning algorithm for data management of IIoT[J]. IEEE Transactions on Industrial Informatics, 2021, 17(12): 8475–8484. doi: 10.1109/tii.2021.3064351.
[26]	李焜, 王喆. 无线通信电波传播模型的研究[J]. 无线通信技术, 2008, 17(1): 10–12. doi: 10.3969/j.issn.1003-8329.2008.01.003. LI Kun and WANG Zhe. Research of wireless communications radio wave propagation model[J]. Wireless Communication Technology, 2008, 17(1): 10–12. doi: 10.3969/j.issn.1003-8329.2008.01.003.
[27]	焦昆. TD-LTE链路预算研究[J]. 现代商贸工业, 2013, 25(16): 161–162. JIAO Kun. Research on TD-LTE link budget[J]. Modern Business Trade Industry, 2013, 25(16): 161–162.
[28]	宋树晨. LTE无线网络规划及其优化研究[D]. [硕士论文], 南京邮电大学, 2019. SONG Shuchen. Research on LTE wireless network planning construction and optimization[D]. [Master dissertation], Nanjing University of Posts and Telecommunications, 2019.
[29]	于美, 朱一帆, 李加淳, 等. 基于澳大利亚山火的无人机调度问题[J]. 高等数学研究, 2023, 26(2): 31–34. doi: 10.3969/j.issn.1008-1399.2023.02.011. YU Mei, ZHU Yifan, LI Jiachun, et al. UAV scheduling problems based on Australian bushfire[J]. Studies in College Mathematics, 2023, 26(2): 31–34. doi: 10.3969/j.issn.1008-1399.2023.02.011.
[30]	MARAL G, BOUSQUET M, and SUN Zhili. Satellite Communications Systems: Systems, Techniques and Technology[M]. 6th ed. Hoboken: Wiley & Sons, 2020: 189–273. doi: 10.1002/9781119673811.
[31]	3GPP TS 38.214. 5G NR, Physical layer procedures for data[S]. 2022.
[32]	3GPP TS 36.133 Evolved Universal Terrestrial Radio Access (E-UTRA); Requirements for support of radio resource management[S]. 2022.
[33]	KIM M G and JO H S. Performance analysis of NB-IoT uplink in low earth orbit non-terrestrial networks[J]. Sensors, 2022, 22(18): 7097. doi: 10.3390/s22187097.
[34]	MA Lin, JIN Ningdi, CUI Yang, et al. LTE user equipment RSRP difference elimination method using multidimensional scaling for LTE fingerprint-based positioning system[C]. 2017 IEEE International Conference on Communications (ICC), Paris, France, 2017: 1–6. doi: 10.1109/icc.2017.7997470.
[35]	CHEN Fatang, LI Xiu, ZHANG Yun, et al. Design and implementation of initial cell search in 5G NR systems[J]. China Communications, 2020, 17(5): 38–49. doi: 10.23919/jcc.2020.05.005.
[36]	3GPP TS 38.211 NR; Physical channels and modulation[S]. 2020.
[37]	3GPP TS 36.214 Evolved Universal Terrestrial Radio Access (E-UTRA); Physical layer; measurements[S]. 2022.
[38]	陈惠河. TD-LTE小区间干扰抑制技术研究[D]. [硕士论文], 吉林大学, 2013. CHEN Huihe. Research on the technology of inter-cell interference control in TD-LTE[D]. [Master dissertation], Jilin University, 2013.
[39]	VAN HASSELT H, GUEZ A, and SILVER D. Deep reinforcement learning with double Q-learning[C]. The 30th AAAI Conference on Artificial Intelligence, Phoenix, USA, 2022: 2094-2100. doi: 10.1609/aaai.v30i1.10295.
[40]	WANG Ziyu, SCHAUL T, HESSEL M, et al. Dueling network architectures for deep reinforcement learning[C]. The 33rd International Conference on International Conference on Machine Learning, New York, USA, 2016: 1995–2003.
[41]	DE SANTIS E, GIUSEPPI A, PIETRABISSA A, et al. Satellite integration into 5G: Deep reinforcement learning for network selection[J]. Machine Intelligence Research, 2022, 19(2): 127–137. doi: 10.1007/s11633-022-1326-3.
[42]	彭代渊, 梁宏斌, 罗玉娇. 基于最大接收功率的异构蜂窝网络接入方法[P]. 中国, 106792893A, 2017. PENG Daiyuan, LIANG Hongbin, and LUO Yujiao Heterogeneous cellular network access method based on maximum received power[P]. CN, 106792893A, 2017.
[43]	姚继明, 郭经红, 张浩, 等. 基于功率优先级的电力LTE专网随机接入技术[J]. 电力系统自动化, 2016, 40(10): 127–131. doi: 10.7500/AEPS20150820005. YAO Jiming, GUO Jinghong, ZHANG Hao, et al. Random access technology of electric dedicated LTE network based on power priority[J]. Automation of Electric Power Systems, 2016, 40(10): 127–131. doi: 10.7500/AEPS20150820005.
[44]	王庆. 5G移动通信大量用户随机接入机制研究[D]. [硕士论文], 北京交通大学, 2018. WANG Qing. Contention-based random access for massive connections in 5G[D]. [Master dissertation], Beijing Jiaotong University, 2018.
[45]	MCPHAIL C, MAIER H R, KWAKKEL J H, et al. Robustness metrics: How are they calculated, when should they be used and why do they give different results?[S]. Earth’s Future, 2018, 6(2): 169–191. doi: 10.1002/2017EF000649.
[46]	LEE K and LIM S. Minimax optimal bandits for heavy tail rewards[J]. IEEE Transactions on Neural Networks and Learning Systems, 2022, 1–15. doi: 10.1109/tnnls.2022.3203035.