高级搜索

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

多跳无人机自组网接入控制协议:深度强化学习时隙分配方法

宋留斌 郭道省

宋留斌, 郭道省. 多跳无人机自组网接入控制协议:深度强化学习时隙分配方法[J]. 电子与信息学报. doi: 10.11999/JEIT241044
引用本文: 宋留斌, 郭道省. 多跳无人机自组网接入控制协议:深度强化学习时隙分配方法[J]. 电子与信息学报. doi: 10.11999/JEIT241044
SONG Liubin, GUO Daoxing. Multi-Hop UAV Ad Hoc Network Access Control Protocol: Deep Reinforcement Learning-Based Time Slot Allocation Method[J]. Journal of Electronics & Information Technology. doi: 10.11999/JEIT241044
Citation: SONG Liubin, GUO Daoxing. Multi-Hop UAV Ad Hoc Network Access Control Protocol: Deep Reinforcement Learning-Based Time Slot Allocation Method[J]. Journal of Electronics & Information Technology. doi: 10.11999/JEIT241044

多跳无人机自组网接入控制协议:深度强化学习时隙分配方法

doi: 10.11999/JEIT241044
详细信息
    作者简介:

    宋留斌:男,博士生,研究方向为无线自组织网络

    郭道省:男,教授,研究方向为卫星通信、无线通信

    通讯作者:

    郭道省 xyzgfg@sina.com

  • 中图分类号: TN926.1

Multi-Hop UAV Ad Hoc Network Access Control Protocol: Deep Reinforcement Learning-Based Time Slot Allocation Method

  • 摘要: 无人机自组织网络中,各个节点的流量不均衡,容易导致网络拥塞和时隙资源利用率低的问题。本该文研究了无人机自组网中饱和节点和不饱和节点共存的场景下的接入控制问题,旨在让更多的节点共享不饱和节点的空闲时隙,提升网络的吞吐量。针对无人机多跳自组织网络接入控制问题,该文提出一种基于深度强化学习的多跳无人机自组织网络MAC协议(DQL-MHTDMA),将饱和节点联合为一个大智能体,学习网络拓扑信息和时隙占用规律,选择最优的接入动作,实现每个时隙上的最大吞吐量或最佳能效。仿真结果表明,所提DQL-MHTDMA协议能够学习时隙的占用规律,并且可以感知多跳拓扑,在多种不饱和流量到达规律下获得最优的吞吐量或最佳的能量效率。
  • 图  1  无人机自组织网络系统模型

    图  2  多跳自组网深度强化学习TDMA决策模型

    图  3  场景2的网络拓扑

    图  4  神经网络架构

    图  5  吞吐量随迭代次数的收敛情况(场景1)

    图  6  3种协议吞吐量随时间的变化情况对比

    图  7  吞吐量对比(场景1)

    图  8  时隙11的能效与理论最优值的对比(场景1)

    图  9  时隙8和时隙9在场景2下的吞吐量收敛情况

    图  10  时隙11的能效随迭代次数的收敛情况(场景1)

    图  11  能效优先目标下时隙11的吞吐量收敛情况(场景1)

    图  12  不同时隙下的能量效率与理论值的对比(场景1)

    图  13  能效优先目标下的不同时隙下的吞吐量对比(场景1)

    1  DQL-MHTDMA(目标:最大吞吐量或者最佳能效)

     (1) 运行中继选择算法,选出中继节点
     (2) 确定智能体成员集合$ {N_S} $
     (3) 初始化参数:初始状态$ {s_0} $,贪心策略中的探索概率$ \varepsilon $,折扣
       因子$ \gamma $,调整步进$ \rho $,最小样本数量$ {N_E} $,更新周期$ F $
     (4) 初始化经验池$ EM $
     (5) 初始化QNN的参数$ \theta $
     (6) 初始化目标QNN的参数$ {\theta ^ - } $
     (7) for t=0,1,2, ···, do
       向QNN输入$ {s_t} $,输出$ Q = \left\{ {q\left( {{s_t},a,\theta } \right)\left| {a \in {A_{{s_t}}}} \right.} \right\} $
       采用$ \varepsilon $-贪婪算法从$ Q $中选择动作$ {a_t} $
       执行动作$ {a_t} $,获得$ {z_t} $和$ {r_{t + 1}} $,得到$ {s_{t + 1}} $
       存储$ \left( {{s_t},{a_t},{r_{t + 1}},{s_{t + 1}}} \right) $经验池$ EM $
       训练QNN网络:
        从经验池中随机选择$ {N_E} $个样本
        for 每一个样本中的经验$ e = \left( {s,a,r,s'} \right) $,do
         计算$ y_{r,s'}^{QNN} = r + \gamma \mathop {\max }\limits_{a'} q\left( {s',a';{\theta ^ - }} \right) $
        end for
        执行梯度下降,在QNN中更新$ \theta $
        如果$ \left( {t/F = = 0} \right) $,更新目标网络$ {\theta ^ - } = \theta $
       end 训练
     end for
    下载: 导出CSV

    表  1  不饱和节点的时隙占用规律(场景1)

    节点
    编号
    发送概率说明节点
    编号
    发送概率说明
    40.2随机120.12随机
    50.4随机131/2周期
    70.67随机140.35随机
    91/4周期151/5周期
    100.8随机160.9随机
    110.55随机
    下载: 导出CSV

    表  2  不饱和节点的时隙占用规律(场景2)

    节点
    编号
    发送
    概率
    节点
    编号
    发送
    概率
    节点
    编号
    发送
    概率
    30.79130.6230.55
    40.31140.18240
    50150.23250.45
    60.2160.26260.65
    70.23170.65270.63
    80.53180.24280.08
    90.17190.19290
    100.67200.69300.23
    110.11210.75310
    120220.28320.91
    下载: 导出CSV

    表  3  超参数设置

    参数名称参数值
    历史状态长度$ M $20
    折扣因子$ \gamma $0.9
    贪心策略中的探索概率$ \varepsilon $0.005~0.010
    学习率1
    目标网络的更新频率$ F $100
    最小样本数量$ {N_E} $64
    经验池大小1000
    鼓励参数$ \beta $12
    下载: 导出CSV
  • [1] KHAN M A, KUMAR N, MOHSAN S A H, et al. Swarm of UAVs for network management in 6G: A technical review[J]. IEEE Transactions on Network and Service Management, 2023, 20(1): 741–761. doi: 10.1109/TNSM.2022.3213370.
    [2] ZENG Yong, ZHANG Rui, and LIM T J. Wireless communications with unmanned aerial vehicles: Opportunities and challenges[J]. IEEE Communications Magazine, 2016, 54(5): 36–42. doi: 10.1109/MCOM.2016.7470933.
    [3] MOZAFFARI M, SAAD W, BENNIS M, et al. A tutorial on UAVs for wireless networks: Applications, challenges, and open problems[J]. IEEE Communications Surveys & Tutorials, 2019, 21(3): 2334–2360. doi: 10.1109/COMST.2019.2902862.
    [4] ARSALAAN A S, FIDA M R, and NGUYEN H X. UAVs relay in emergency communications with strict requirements on quality of information[J]. IEEE Transactions on Vehicular Technology, 2025, 74(3): 4877–4892. doi: 10.1109/TVT.2024.3493206.
    [5] CHEN Jiaxin, CHEN Ping, WU Qihui, et al. A game-theoretic perspective on resource management for large-scale UAV communication networks[J]. China Communications, 2021, 18(1): 70–87. doi: 10.23919/JCC.2021.01.007.
    [6] QI Fei, ZHU Xuetian, MANG Ge, et al. UAV network and IoT in the sky for future smart cities[J]. IEEE Network, 2019, 33(2): 96–101. doi: 10.1109/MNET.2019.1800250.
    [7] NATKANIEC M, KOSEK-SZOTT K, SZOTT S, et al. A survey of medium access mechanisms for providing QoS in Ad-hoc networks[J]. IEEE Communications Surveys & Tutorials, 2013, 15(2): 592–620. doi: 10.1109/SURV.2012.060912.00004.
    [8] BORGONOVO F, CAPONE A, CESANA M, et al. ADHOC MAC: New MAC architecture for Ad hoc networks providing efficient and reliable point-to-point and broadcast services[J]. Wireless Networks, 2004, 10(4): 359–366. doi: 10.1023/B:WINE.0000028540.96160.8a.
    [9] OMAR H A, ZHUANG Weihua, and LI Li. VeMAC: A TDMA-based MAC protocol for reliable broadcast in VANETs[J]. IEEE Transactions on Mobile Computing, 2013, 12(9): 1724–1736. doi: 10.1109/TMC.2012.142.
    [10] NGUYEN V, DANG D N M, JANG S, et al. E-VeMAC: An enhanced vehicular MAC protocol to mitigate the exposed terminal problem[C]. The 16th Asia-Pacific Network Operations and Management Symposium, Hsinchu, China, 2014: 1–4. doi: 10.1109/APNOMS.2014.6996561.
    [11] ZOU Rui, LIU Zishan, ZHANG Lin, et al. A near collision free reservation based MAC protocol for VANETs[C]. 2014 IEEE Wireless Communications and Networking Conference (WCNC), Istanbul, Turkey, 2014: 1538–1543. doi: 10.1109/WCNC.2014.6952438.
    [12] JIANG Anzhou, MI Zhichao, DONG Chao, et al. CF-MAC: A collision-free MAC protocol for UAVs Ad-hoc networks[C]. 2016 IEEE Wireless Communications and Networking Conference, Doha, Qatar, 2016: 1–6. doi: 10.1109/WCNC.2016.7564844.
    [13] CHUA M Y K, YU F R, LI Jun, et al. Medium access control for unmanned aerial vehicle (UAV) Ad-hoc networks with full-duplex radios and multipacket reception capability[J]. IEEE Transactions on Vehicular Technology, 2013, 62(1): 390–394. doi: 10.1109/TVT.2012.2211905.
    [14] MAO Qian, HU Fei, and HAO Qi. Deep learning for intelligent wireless networks: A comprehensive survey[J]. IEEE Communications Surveys & Tutorials, 2018, 20(4): 2595–2621. doi: 10.1109/COMST.2018.2846401.
    [15] LIU Xin, SUN Can, YAU K L A, et al. Joint collaborative big spectrum data sensing and reinforcement learning based dynamic spectrum access for cognitive internet of vehicles[J]. IEEE Transactions on Intelligent Transportation Systems, 2024, 25(1): 805–815. doi: 10.1109/TITS.2022.3175570.
    [16] ZHANG Xiaohui, CHEN Ze, ZHANG Yinghui, et al. Deep-reinforcement-learning-based distributed dynamic spectrum access in multiuser multichannel cognitive radio internet of things networks[J]. IEEE Internet of Things Journal, 2024, 11(10): 17495–17509. doi: 10.1109/JIOT.2024.3359277.
    [17] 邓炳光, 徐成义, 张泰, 等. 基于多智能体深度强化学习的D2D通信资源联合分配方法[J]. 电子与信息学报, 2023, 45(4): 1173–1182. doi: 10.11999/JEIT220231.

    DENG Bingguang, XU Chengyi, ZHANG Tai, et al. A joint resource allocation method of D2D communication resources based on multi-agent deep reinforcement learning[J]. Journal of Electronics & Information Technology, 2023, 45(4): 1173–1182. doi: 10.11999/JEIT220231.
    [18] NISIOTI E and THOMOS N. Fast Q-learning for improved finite length performance of irregular repetition slotted ALOHA[J]. IEEE Transactions on Cognitive Communications and Networking, 2020, 6(2): 844–857. doi: 10.1109/TCCN.2019.2957224.
    [19] NAPARSTEK O and COHEN K. Deep multi-user reinforcement learning for distributed dynamic spectrum access[J]. IEEE Transactions on Wireless Communications, 2019, 18(1): 310–323. doi: 10.1109/TWC.2018.2879433.
    [20] YU Yiding, WANG Taotao, and LIEW S C. Deep-reinforcement learning multiple access for heterogeneous wireless networks[J]. IEEE Journal on Selected Areas in Communications, 2019, 37(6): 1277–1290. doi: 10.1109/JSAC.2019.2904329.
    [21] WANG Shangxing, LIU Hanpeng, GOMES P H, et al. Deep reinforcement learning for dynamic multichannel access in wireless networks[J]. IEEE Transactions on Cognitive Communications and Networking, 2018, 4(2): 257–265. doi: 10.1109/TCCN.2018.2809722.
    [22] CUI Qimei, ZHANG Ziyuan, SHI Yanpeng, et al. Dynamic multichannel access based on deep reinforcement learning in distributed wireless networks[J]. IEEE Systems Journal, 2022, 16(4): 5831–5834. doi: 10.1109/JSYST.2021.3134820.
    [23] ZHANG Shuying, NI Zuyao, KUANG Linling, et al. Load-aware distributed resource allocation for MF-TDMA Ad hoc networks: A multi-agent DRL approach[J]. IEEE Transactions on Network Science and Engineering, 2022, 9(6): 4426–4443. doi: 10.1109/TNSE.2022.3201121.
    [24] SOHAIB M, JEONG J, and JEON S W. Dynamic multichannel access via multi-agent reinforcement learning: Throughput and fairness guarantees[J]. IEEE Transactions on Wireless Communications, 2022, 21(6): 3994–4008. doi: 10.1109/TWC.2021.3126112.
    [25] LIU Xiaoyu, XU Chi, YU Haibin, et al. Deep reinforcement learning-based multichannel access for industrial wireless networks with dynamic multiuser priority[J]. IEEE Transactions on Industrial Informatics, 2022, 18(10): 7048–7058. doi: 10.1109/TII.2021.3139349.
    [26] NAEEM F, ADAM N, KADDOUM G, et al. Learning MAC protocols in HetNets: A cooperative multi-agent deep reinforcement learning approach[C]. 2024 IEEE Wireless Communications and Networking Conference (WCNC), Dubai, United Arab Emirates, 2024: 1–6. doi: 10.1109/WCNC57260.2024.10571321.
    [27] MIUCCIO L, RIOLO S, BENNIS M, et al. Design of a feasible wireless MAC communication protocol via multi-agent reinforcement learning[C]. 2024 IEEE International Conference on Machine Learning for Communication and Networking (ICMLCN), Stockholm, Sweden, 2024: 94–100. doi: 10.1109/ICMLCN59089.2024.10624759.
    [28] ZOU Yifei, ZHANG Zuyuan, ZHANG Congwei, et al. A distributed abstract MAC layer for cooperative learning on internet of vehicles[J]. IEEE Transactions on Intelligent Transportation Systems, 2024, 25(8): 8972–8983. doi: 10.1109/TITS.2024.3362909.
    [29] 唐龙, 王峰. 基于UCDS的战术网络拓扑构建研究[J]. 通信技术, 2015, 48(9): 1037–1043. doi: 10.3969/j.issn.1002-0802.2015.09.011.

    TANG Long and WANG Feng. Tactical network topology construction based on UCDS[J]. Communications Technology, 2015, 48(9): 1037–1043. doi: 10.3969/j.issn.1002-0802.2015.09.011.
    [30] 王聪, 赵几航, 吴霞, 等. 面向FANET的N-UCDS虚拟骨干网构建方法[J]. 陆军工程大学学报, 2023, 2(1): 55–62. doi: 10.12018/j.issn.2097-0730.20220117001.

    WANG Cong, ZHAO Jihang, WU Xia, et al. FANET-oriented construction method of N-UCDS virtual backbone network[J]. Journal of Army Engineering University of PLA, 2023, 2(1): 55–62. doi: 10.12018/j.issn.2097-0730.20220117001.
  • 加载中
图(13) / 表(4)
计量
  • 文章访问数:  22
  • HTML全文浏览量:  11
  • PDF下载量:  0
  • 被引次数: 0
出版历程
  • 收稿日期:  2024-11-26
  • 修回日期:  2025-04-02
  • 网络出版日期:  2025-04-15

目录

    /

    返回文章
    返回