高级搜索

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

基于多智能体深度强化学习的无人机动态预部署策略

唐伦 李质萱 蒲昊 汪智平 陈前斌

唐伦, 李质萱, 蒲昊, 汪智平, 陈前斌. 基于多智能体深度强化学习的无人机动态预部署策略[J]. 电子与信息学报, 2023, 45(6): 2007-2015. doi: 10.11999/JEIT220513
引用本文: 唐伦, 李质萱, 蒲昊, 汪智平, 陈前斌. 基于多智能体深度强化学习的无人机动态预部署策略[J]. 电子与信息学报, 2023, 45(6): 2007-2015. doi: 10.11999/JEIT220513
TANG Lun, LI Zhixuan, PU Hao, WANG Zhiping, CHEN Qianbin. A Dynamic Pre-Deployment Strategy of UAVs Based on Multi-Agent Deep Reinforcement Learning[J]. Journal of Electronics & Information Technology, 2023, 45(6): 2007-2015. doi: 10.11999/JEIT220513
Citation: TANG Lun, LI Zhixuan, PU Hao, WANG Zhiping, CHEN Qianbin. A Dynamic Pre-Deployment Strategy of UAVs Based on Multi-Agent Deep Reinforcement Learning[J]. Journal of Electronics & Information Technology, 2023, 45(6): 2007-2015. doi: 10.11999/JEIT220513

基于多智能体深度强化学习的无人机动态预部署策略

doi: 10.11999/JEIT220513
基金项目: 国家自然科学基金(62071078),重庆市教委科学技术研究项目(KJZD-M201800601),川渝联合实施重点研发项目(2021YFQ0053)
详细信息
    作者简介:

    唐伦:男,教授,博士生导师,主要研究方向为下一代无线通信网络、异构蜂窝网络、软件定义网络等

    李质萱:女,硕士生,研究方向为智能网络、边缘计算、无人机通信等

    蒲昊:男,硕士生,研究方向为边缘智能计算资源分配与协同机理、无人机等

    汪智平:男,硕士生,研究方向为边缘智能计算协同机理、联邦学习通信优化等

    陈前斌:男,教授,博士生导师,主要研究方向为个人通信、多媒体信息处理与传输、异构蜂窝网络等

    通讯作者:

    蒲昊 puhao19970525@163.com

  • 中图分类号: TN929.5

A Dynamic Pre-Deployment Strategy of UAVs Based on Multi-Agent Deep Reinforcement Learning

Funds: The National Natural Science Foundation of China (62071078), The Science and Technology Research Program of Chongqing Municipal Education Commission (KJZD-M201800601), Sichuan and Chongqing Key R&D Projects (2021YFQ0053)
  • 摘要: 针对传统优化算法在求解长时间尺度内通信无人机(UAV)动态部署时复杂度过高且难以与动态环境信息匹配等缺陷,该文提出一种基于多智能体深度强化学习(MADRL)的UAV动态预部署策略。首先利用一种深度时空网络模型预测用户的预期速率需求以捕捉动态环境信息,定义用户满意度的概念以刻画用户所获得UAV提供服务的公平性,并以最大化长期总体用户满意度和最小化UAV移动及发射能耗为目标建立优化模型。其次,将上述模型转化为部分可观测马尔科夫博弈过程(POMG),并提出一种基于MADRL的H-MADDPG算法求解该POMG中轨迹规划、用户关联和功率分配的最佳决策。该H-MADDPG算法使用混合网络结构以实现对多模态输入的特征提取,并采用集中式训练-分布式执行的机制以高效地训练和执行决策。最后仿真结果证明了所提算法的有效性。
  • 图  1  执行者网络结构

    图  2  评判者网络结构

    图  3  不同学习率下的算法训练对比

    图  4  算法收敛对比

    图  5  算法性能对比

    图  6  累计发射功耗对比

    图  7  累计路径长度对比

    算法1 H-MADDP算法
     输入:$ {{\hat {\boldsymbol X}}^{{\text{Tr}}}}(t) $,$ {{\hat {\boldsymbol X}}^{\text{U}}}(t) $($ t \in {\mathcal{T}} $),最大回合数E,最大时间步长
        T,$ \gamma $,$ \tau $,I,最大代数(epoch)K
     输出:$ {\omega _m} $,$ {\omega '_m} $,$ {\theta _m} $,$ {\theta '_m} $
     1  随机初始化所有智能体的在线/目标评判者网络、在线/目标
       执行者网络
     2  for episode=1~E:
     3   初始化全局状态s和所有智能体经验回放池
     4   for t=1~T:
     5     所有智能体基于观测状态执行动作
     6     全局状态由s跳变至$ s' $,所有智能体得到相应奖励,并
          将样本存储至经验回放池
     7     if 经验回放池已满:
     8      for m=1~M:
     9       for epoch=1~K:
     10        循环采样I个样本直至所有样本参与训练
     11        每次采样根据式(12)和式(13)更新$ {\omega _m} $,根据
              式(14)和式(15)更新$ {\theta _m} $
     12       end for
     13      end for
     14     清空经验回放池
     15     $ s \leftarrow s' $
     16     根据式(16)、式(17)更新$ {\omega '_m} $和$ {\theta '_m} $
     17    end for
     18 end for
    下载: 导出CSV

    表  1  仿真参数设置

    仿真参数数值仿真参数数值
    载波频率fc5 GHz环境常量a/b9.6/0.2
    天线增益G10 dB权重系数$ \varphi $/$ \lambda $/$ \beta $10–1/10–3/10–1
    总带宽B10 MHz惩罚系数$ {\eta _1} $/$ {\eta _2} $/$ {\eta _3} $10–2/10–1/102
    噪声功率谱密度N0–174 dBm/HzUAV数量M3
    $ {\mu _{{\text{LoS}}}} $/$ {\sigma _{{\text{LoS}}}} $/$ {\mu _{\text{N}}}_{{\text{LoS}}} $/$ {\sigma _{{\text{NLoS}}}} $1.6/8.41/23/33.78Pmax/dmax/dmin30 W/1000 m/100 m
    区域长度L/宽度W10(×200 m)/10(×200 m)训练参数E/T/K/I/τ1000/200/100/5/0.1
    下载: 导出CSV

    表  2  H-MADDPG与MADDPG网络结构对比

    H-MADDPG结构参数MADDPG结构参数



    执行者网络
    卷积层132个3×3卷积核

    池化层12×2 平均池化
    卷积层216个3×3卷积核
    池化层22×2 平均池化
    全连接层1256个神经元512个神经元
    全连接层2128个神经元256个神经元
    全连接层350个神经元




    评判者网络
    卷积层132个3×3卷积核

    池化层12×2 平均池化
    卷积层216个3×3卷积核
    池化层22×2 平均池化
    全连接层1512个神经元1024个神经元
    全连接层2256个神经元512个神经元
    全连接层3128个神经元200个神经元
    全连接层320个神经元
    下载: 导出CSV

    表  3  算法对应权重系数

    总体用户满意度
    权重系数$ \varphi $
    UAV单位移动功耗
    权重系数$ \beta $
    H-MADDPG11×10–11×10–3
    H-MADDPG20.9×10–11.05×10–3
    H-MADDPG30.7×10–11.1×10–3
    EED1×10–11×10–3
    下载: 导出CSV
  • [1] SAAD W, BENNIS M, and CHEN Mingzhe. A vision of 6G wireless systems: Applications, trends, technologies, and open research problems[J]. IEEE Network, 2020, 34(3): 134–142. doi: 10.1109/MNET.001.1900287
    [2] 陈新颖, 盛敏, 李博, 等. 面向6G的无人机通信综述[J]. 电子与信息学报, 2022, 44(3): 781–789. doi: 10.11999/JEIT210789

    CHEN Xinying, SHENG Min, LI Bo, et al. Survey on unmanned aerial vehicle communications for 6G[J]. Journal of Electronics &Information Technology, 2022, 44(3): 781–789. doi: 10.11999/JEIT210789
    [3] WANG Qian, CHEN Zhi, LI Hang, et al. Joint power and trajectory design for physical-layer secrecy in the UAV-aided mobile relaying system[J]. IEEE Access, 2018, 6: 62849–62855. doi: 10.1109/ACCESS.2018.2877210
    [4] ZHANG Guangchi, WU Qingqing, CUI Miao, et al. Securing UAV communications via joint trajectory and power control[J]. IEEE Transactions on Wireless Communications, 2019, 18(2): 1376–1389. doi: 10.1109/TWC.2019.2892461
    [5] GAO Ying, TANG Hongying, LI Baoqing, et al. Joint trajectory and power design for UAV-enabled secure communications with no-fly zone constraints[J]. IEEE Access, 2019, 7: 44459–44470. doi: 10.1109/ACCESS.2019.2908407
    [6] ZHANG Shuhang, ZHANG Hongliang, HE Qichen, et al. Joint trajectory and power optimization for UAV relay networks[J]. IEEE Communications Letters, 2018, 22(1): 161–164. doi: 10.1109/LCOMM.2017.2763135
    [7] YANG Gang, DAI Rao, and LIANG Yingchang. Energy-efficient UAV backscatter communication with joint trajectory design and resource optimization[J]. IEEE Transactions on Wireless Communications, 2021, 20(2): 926–941. doi: 10.1109/TWC.2020.3029225
    [8] LIU C H, CHEN Zheyu, TANG Jian, et al. Energy-efficient UAV control for effective and fair communication coverage: A deep reinforcement learning approach[J]. IEEE Journal on Selected Areas in Communications, 2018, 36(9): 2059–2070. doi: 10.1109/JSAC.2018.2864373
    [9] ZHAO Nan, CHENG Yiqiang, PEI Yiyang, et al. Deep reinforcement learning for trajectory design and power allocation in UAV networks[C]. 2020 IEEE International Conference on Communications, Dublin, Ireland, 2020: 1–6.
    [10] WANG Liang, WANG Kezhi, PAN Cunhua, et al. Deep reinforcement learning based dynamic trajectory control for UAV-assisted mobile edge computing[J]. IEEE Transactions on Mobile Computing, 2022, 21(10): 3536–3550.
    [11] CHEN Xiaming, JIN Yaohui, QIANG Siwei, et al. Analyzing and modeling spatio-temporal dependence of cellular traffic at city scale[C]. 2015 IEEE International Conference on Communications, London, the United Kingdom, 2015: 3585–3591.
    [12] ZHANG Chuanting, ZHANG Haixia, QIAO Jingping, et al. Deep transfer learning for intelligent cellular traffic prediction based on cross-domain big data[J]. IEEE Journal on Selected Areas in Communications, 2019, 37(6): 1389–1401. doi: 10.1109/JSAC.2019.2904363
    [13] 唐伦, 蒲昊, 汪智平, 等. 基于注意力机制ConvLSTM的UAV节能预部署策略[J]. 电子与信息学报, 2022, 44(3): 960–968. doi: 10.11999/JEIT211368

    TANG Lun, PU Hao, WANG Zhiping, et al. Energy-efficient predictive deployment strategy of UAVs based on ConvLSTM with attention mechanism[J]. Journal of Electronic &Information Technology, 2022, 44(3): 960–968. doi: 10.11999/JEIT211368
    [14] OSBORNE M J. An Introduction to Game Theory[M]. London: Oxford University Press, 2003: 8–10.
    [15] SUTTON R S and BARTO A G. Reinforcement Learning: An Introduction[M]. Cambridge: MIT Press, 2018: 324–326.
    [16] ZHANG Qianqian, SAAD W, BENNIS M, et al. Predictive deployment of UAV base stations in wireless networks: Machine learning meets contract theory[J]. IEEE Transactions on Wireless Communications, 2021, 20(1): 637–652. doi: 10.1109/TWC.2020.3027624
    [17] YIN Sixing and YU R F. Resource allocation and trajectory design in UAV-Aided cellular networks based on multiagent reinforcement learning[J]. IEEE Internet of Things Journal, 2022, 9(4): 2933–2943. doi: 10.1109/JIOT.2021.3094651
  • 加载中
图(7) / 表(4)
计量
  • 文章访问数:  1091
  • HTML全文浏览量:  225
  • PDF下载量:  311
  • 被引次数: 0
出版历程
  • 收稿日期:  2022-04-22
  • 修回日期:  2022-06-01
  • 网络出版日期:  2022-06-22
  • 刊出日期:  2023-06-10

目录

    /

    返回文章
    返回