基于多智能体深度强化学习的D2D通信资源联合分配方法

邓炳光; 徐成义; 张泰; 孙远欣; 张蔺; 裴二荣

doi:10.11999/JEIT220231

基于多智能体深度强化学习的D2D通信资源联合分配方法

doi: 10.11999/JEIT220231

1.
重庆邮电大学通信与信息工程学院重庆 400065
2.
国网四川省电力公司电力科学研究院成都 610093
3.
重庆金美通信有限公司重庆 400035
4.
电子科技大学通信抗干扰技术国家级重点实验室成都 611731

基金项目: 国家重大专项(2018zx0301016)，国家自然科学基金项目(62071077)，重庆成渝科技创新项目(KJCXZD2020026)

详细信息

作者简介:
邓炳光：男，副教授，研究方向为移动通信

徐成义：男，硕士生，研究方向为无线移动通信

张泰：男，高级工程师，研究方向为电力通信

孙远欣：男，博士，研究方向为无线移动通信

张蔺：男，副教授，研究方向为无线移动通信

裴二荣：男，教授，研究方向为无线移动通信

通讯作者:
裴二荣　peier@cqupt.edu.cn

中图分类号: TN929.5
计量
- 文章访问数: 964
- HTML全文浏览量: 706
- PDF下载量: 238
- 被引次数: 0
出版历程
- 收稿日期: 2022-03-04
- 修回日期: 2022-05-26
- 网络出版日期: 2022-05-31
- 刊出日期: 2023-04-10

A Joint Resource Allocation Method of D2D Communication Resources Based on Multi-agent Deep Reinforcement Learning

1.
Institute of Communication and Information Engineering, Chongqing University of Posts and Telecommunications, Chongqing 400065, China
2.
Electric Power Research Institute of State Grid Sichuan Electric Power Company, Chengdu 610093, China
3.
Chongqing Jinmei Communication Co., Ltd, Chongqing 400035, China
4.
State Key Laboratory of Communication Anti-interference Technology, University of Electronic Science and Technology of China, Chengdu 611731, China

Funds: The National Major Project (2018zx0301016), The National Natural Science Foundation of China (62071077), Chongqing Chengyu Science and Technology Innovation Project (KJCXZD2020026)

摘要

摘要: 设备对设备(D2D)通信作为一种短距离通信技术，能够极大地减轻蜂窝基站的负载压力和提高频谱利用率。然而将D2D直接部署在授权频段或者免授权频段必然导致与现有用户的严重干扰。当前联合部署在授权和免授权频段的D2D通信的资源分配通常被建模为混合整数非线性约束的组合优化问题，传统优化方法难以解决。针对这个挑战性问题，该文提出一种基于多智能体深度强化学习的D2D通信资源联合分配方法。在该算法中，将蜂窝网络中的每个D2D发射端作为智能体，智能体能够通过深度强化学习方法智能地选择接入免授权信道或者最优的授权信道并发射功率。通过选择使用免授权信道的D2D对(基于“先听后说”机制)向蜂窝基站的信息反馈，蜂窝基站能够在非协作的情况下获得WiFi网络吞吐量信息，使得算法能够在异构环境中执行并能够确保WiFi用户的QoS。与多智能体深度Q网络(MADQN)、多智能体Q学习(MAQL)和随机算法相比，所提算法在保证WiFi用户和蜂窝用户的QoS的情况下能够获得最大的吞吐量。
- D2D通信 /
- 先听后说 /
- 免授权频段长期演进 /
- 资源分配 /
- 多智能体强化学习
Abstract: As a short-range communication technology, Device-to-Device (D2D) communication can greatly reduce the load pressure on cellular base stations and improve spectrum utilization. However, the direct deployment of D2D to licensed or unlicensed bands will inevitably lead to serious interference with existing users. At present, the resource allocation of D2D communication jointly deployed in licensed and unlicensed bands is usually modeled as a mixed-integer nonlinear constraint combinatorial optimization problem, which is difficult to solve by traditional optimization methods. To address this challenging problem, a multi-agent deep reinforcement learning based joint resource allocation D2D communication method is proposed. In this algorithm, each D2D transmitter in the cellular network acts as an agent, which can intelligently select access to the unlicensed channel or the optimal licensed channel and it transmits power through the deep reinforcement learning method. Through the feedback of D2D pairs that compete for the unlicensed channels based on the Listen Before Talk (LBT) mechanism, WiFi network throughput information can be obtained by cellular base station in a non-cooperative manner, so that the algorithm can be executed in a heterogeneous environment and QoS of WiFi users is guaranteed. Compared with Multi Agent Deep Q Network (MADQN), Multi Agent Q Learning (MAQL) and Random Baseline algorithms, the proposed algorithm can achieve the maximum throughput while the QoS is guaranteed for both WiFi users and cellular users.
- D2D communication /
- Listen Before Talk (LBT) /
- Long term evolution in the unlicensed band /
- Resource allocation /
- Multi-agent reinforcement learning

HTML全文

图 1 系统模型

下载: 全尺寸图片幻灯片

图 2 D2D多智能体系统示意图

下载: 全尺寸图片幻灯片

图 3 Actor和Critic网络的深度神经网络模型

下载: 全尺寸图片幻灯片

图 4 不同算法的总奖励函数对比

下载: 全尺寸图片幻灯片

图 5 D2D接入免授权频段数量变化

下载: 全尺寸图片幻灯片

图 6 D2D用户和蜂窝用户的吞吐量变化

下载: 全尺寸图片幻灯片

图 7 蜂窝用户和WiFi用户的平均吞吐量变化

下载: 全尺寸图片幻灯片

图 8 D2D用户数量对系统吞吐量的影响

下载: 全尺寸图片幻灯片

图 9 WiFi用户数量对系统吞吐量的影响

下载: 全尺寸图片幻灯片

图 10 D2D发射端到接收端的距离对系统吞吐量的影响

下载: 全尺寸图片幻灯片

图 11 D2D最大发射功率限制对系统吞吐量的影响

下载: 全尺寸图片幻灯片

算法1　基于多智能体深度强化学习的D2D通信资源联合分配方法
(1) 设置超参数：折扣因子$ \gamma $，演员网络的学习率$ {\alpha _{\text{a}}} $，评论家网络的学习率$ {\alpha _{\text{c}}} $；
(2) 随机初始化演员网络参数$ {\theta ^0},{\theta ^1}, \cdots ,{\theta ^M} $和评论家网络参数$ {\omega ^0},{\omega ^1}, \cdots ,{\omega ^M} $；
(3) 所有的智能体(D2D对)获得初始状态${S_0} = \left\{ {s_0^1,s_0^2, \cdots ,s_0^M} \right\}$；
(4) for $t = 1,2, \cdots ,T$do
(5) 　　　　for $i = 1,2, \cdots ,M$ do
(6) 　　　　　　　第$i$对D2D用户将自身的观测$o_t^i$作为策略网络的输入，根据当前策略函数$ {\pi _{\theta _t^i}}\left( {a_t^i\left\| {o_t^i;\theta _t^i} \right.} \right) $选取动作$a_t^i$；
(7) 　　　　　　end for
(8) 　　　　　　所有D2D发射端执行动作${A_t} = \left\{ {a_t^1,a_t^2, \cdots ,a_t^M} \right\}$，获得奖励${R_t} = \left\{ {r_t^1,r_t^2, \cdots ,r_t^M} \right\}$以及下一个状态　　　　　　　　　 ${S_{t + 1}} = \left\{ {o_{t + 1}^1,o_{t + 1}^2, \cdots ,o_{t + 1}^M} \right\}$；
(9) 　　　　　　 for $i = 1,2, \cdots ,M$ do
(10) 　　　　　　第$i$对D2D用户将自身观测$o_{t + 1}^i$作为Critic网络的输入并计算$ {V^i}\left( {o_{t + 1}^i} \right) $；
计算TD误差$ \delta _t^i(o_t^i) = r_t^i + \gamma {V^i}\left( {o_{t + 1}^i} \right) - {V^i}(o_t^i) $；
更新评论家网络参数$\omega _{t + 1}^i = \omega _t^i + {\alpha _{\text{c} } }{{\text{∇}} _{ {\omega ^i} } }V_{ {\omega ^i} }^i\left( {o_t^i} \right){\delta ^i}(o_t^i)$；
更新演员网络参数$\theta _{t + 1}^i = \theta _t^i + {\alpha _{\text{a} } }{{\text{∇}} _{ {\theta ^i} } }\ln \pi _{ {\theta ^i} }^i\left( {a_t^i\left\| {o_t^i} \right.} \right){\delta ^i}(o_t^i)$；
(11) 　　　　　　 end for
(12) 　　　　　　所有D2D用户更新自身状态${S_t} = {S_{t + 1}}$；
(13) end for

下载: 导出CSV

表 1 参数配置

参数	取值	参数	取值	参数	取值	参数	取值
蜂窝半径	500 m	噪声功率谱密度	–174 dBm/Hz	蜂窝用户的最大发射功率	23 dBm	阴影衰落标准差	8 dB
蜂窝用户数	5	D2D用户传输功率级别	5	D2D用户距离	25 m	蜂窝用户QoS的信噪比阈值	6 dB
D2D用户数	5～45	路径损耗的衰减因子	4	授权上行信道带宽	5 MHz	D2D用户QoS的信噪比阈值	6 dB
WiFi用户数	1-11	D2D用户的最大发射功率	23 dBm	免授权信道带宽	20 MHz	$ S_{\min }^W $	6 Mbit/s

下载: 导出CSV

参考文献(22)

[1]	CISCO. Cisco annual internet report (2018–2023) white paper[EB/OL]. https://www.cisco.com/c/en/us/solutions/collateral/executive-perspectives/annual-internet-report/white-paper-c11-741490.html, 2021.
[2]	MACH P, BECVAR Z, and VANEK T. In-band device-to-device communication in OFDMA cellular networks: A survey and challenges[J]. IEEE Communications Surveys & Tutorials, 2015, 17(4): 1885–1922. doi: 10.1109/COMST.2015.2447036
[3]	AHMED M, LI Yong, WAQAS M, et al. A survey on socially aware device-to-device communications[J]. IEEE Communications Surveys & Tutorials, 2018, 20(3): 2169–2197. doi: 10.1109/COMST.2018.2820069
[4]	ZHANG Hongliang, LIAO Yun, and SONG Lingyang. D2D-U: Device-to-device communications in unlicensed bands for 5G system[J]. IEEE Transactions on Wireless Communications, 2017, 16(6): 3507–3519. doi: 10.1109/TWC.2017.2683479
[5]	WU Yue, GUO Weisi, YUAN Hu, et al. Device-to-device meets LTE-unlicensed[J]. IEEE Communications Magazine, 2016, 54(5): 154–159. doi: 10.1109/MCOM.2016.7470950
[6]	KO H, LEE J, and PACK S. A fair listen-before-talk algorithm for coexistence of LTE-U and WLAN[J]. IEEE Transactions on Vehicular Technology, 2016, 65(12): 10116–10120. doi: 10.1109/TVT.2016.2533627
[7]	张达敏, 张绘娟, 闫威, 等. 异构网络中基于能效优化的D2D资源分配机制[J]. 电子与信息学报, 2020, 42(2): 480–487. doi: 10.11999/JEIT190042 ZHANG Damin, ZHANG Huijuan, YAN Wei, et al. D2D resource allocation mechanism based on energy efficiency optimization in heterogeneous networks[J]. Journal of Electronics &Information Technology, 2020, 42(2): 480–487. doi: 10.11999/JEIT190042
[8]	KHUNTIA P and HAZRA R. An efficient channel and power allocation scheme for D2D enabled cellular communication system: An IoT application[J]. IEEE Sensors Journal, 2021, 21(22): 25340–25351. doi: 10.1109/JSEN.2021.3060616
[9]	TANG Huan and ZHI Ding. Mixed mode transmission and resource allocation for D2D communication[J]. IEEE Transactions on Wireless Communications, 2016, 15(1): 162–175. doi: 10.1109/TWC.2015.2468725
[10]	PAWAR P and TRIVEDI A. Joint uplink-downlink resource allocation for D2D underlaying cellular network[J]. IEEE Transactions on Communications, 2021, 69(12): 8352–8362. doi: 10.1109/TCOMM.2021.3116947
[11]	徐勇军, 谷博文, 杨洋, 等. 基于不完美CSI的D2D通信网络鲁棒能效资源分配算法[J]. 电子与信息学报, 2021, 43(8): 2189–2198. doi: 10.11999/JEIT200587 XU Yongjun, GU Bowen, YANG Yang, et al. Robust energy-efficient resource allocation algorithm in D2D communication networks with imperfect CSI[J]. Journal of Electronics &Information Technology, 2021, 43(8): 2189–2198. doi: 10.11999/JEIT200587
[12]	SHANG Bodong, ZHAO Liqiang, and CHEN K C. Enabling device-to-device communications in LTE-unlicensed spectrum[C]. Proceedings of 2017 IEEE International Conference on Communications (ICC), Paris, France, 2017: 1–6.
[13]	YIN Rui, WU Zheli, LIU Shengli, et al. Decentralized radio resource adaptation in D2D-U networks[J]. IEEE Internet of Things Journal, 2021, 8(8): 6720–6732. doi: 10.1109/JIOT.2020.3016019
[14]	XING Chigang and LI Fangmin. Unlicensed spectrum-sharing mechanism based on Wi-Fi security requirements implemented using device to device communication technology[J]. IEEE Access, 2020, 8: 135025–135036. doi: 10.1109/ACCESS.2020.3011134
[15]	WANG Ganggui, WU C, YOSHINAGA T, et al. Coexistence analysis of D2D-unlicensed and Wi-Fi communications[J]. Wireless Communications and Mobile Computing, 2021, 2021: 5523273. doi: 10.1155/2021/5523273
[16]	AMIRI R, MEHRPOUYAN H, FRIDMAN L, et al. A machine learning approach for power allocation in HetNets considering QoS[C]. 2018 IEEE International Conference on Communications (ICC), Kansas City, USA, 2018.
[17]	MASADEH A, WANG Zhengdao, and KAMAL A E. Reinforcement learning exploration algorithms for energy harvesting communications systems[C]. 2018 IEEE International Conference on Communications (ICC), Kansas City, USA, 2018.
[18]	LUO Yong, SHI Zhiping, ZHOU Xin, et al. Dynamic resource allocations based on Q-learning for D2D communication in cellular networks[C]. The 2014 11th International Computer Conference on Wavelet Actiev Media Technology and Information Processing (ICCWAMTIP), Chengdu, China, 2014: 385–388.
[19]	ZIA K, JAVED N, SIAL M N, et al. A distributed multi-agent RL-based autonomous spectrum allocation scheme in D2D enabled multi-tier HetNets[J]. IEEE Access, 2019, 7: 6733–6745. doi: 10.1109/ACCESS.2018.2890210
[20]	PEI Eerong, ZHU Bingbing, and LI Yun. A Q-learning based resource allocation algorithm for D2D-unlicensed communications[C]. The 2021 IEEE 93rd Vehicular Technology Conference (VTC2021-Spring), Helsinki, Finland, 2021: 1–6.
[21]	LI Zheng and GUO Caili. Multi-agent deep reinforcement learning based spectrum allocation for D2D underlay communications[J]. IEEE Transactions on Vehicular Technology, 2020, 69(2): 1828–1840. doi: 10.1109/TVT.2019.2961405
[22]	3GPP. 3GPP TR 36.814 V9.0. 0 Further advancements for E-UTRA physical layer aspects[S]. Valbonne: 3GPP, 2010.