高级搜索

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

基于强化学习的802.11ax上行链路调度算法

黄新林 郑人华

黄新林, 郑人华. 基于强化学习的802.11ax上行链路调度算法[J]. 电子与信息学报, 2022, 44(5): 1800-1808. doi: 10.11999/JEIT210590
引用本文: 黄新林, 郑人华. 基于强化学习的802.11ax上行链路调度算法[J]. 电子与信息学报, 2022, 44(5): 1800-1808. doi: 10.11999/JEIT210590
HUANG Xinlin, ZHENG Renhua. 802.11ax Uplink Scheduling Algorithm Based on Reinforcement Learning[J]. Journal of Electronics & Information Technology, 2022, 44(5): 1800-1808. doi: 10.11999/JEIT210590
Citation: HUANG Xinlin, ZHENG Renhua. 802.11ax Uplink Scheduling Algorithm Based on Reinforcement Learning[J]. Journal of Electronics & Information Technology, 2022, 44(5): 1800-1808. doi: 10.11999/JEIT210590

基于强化学习的802.11ax上行链路调度算法

doi: 10.11999/JEIT210590
基金项目: 国家自然科学基金(62071332),上海市青年科技启明星计划(19QA1409100),中央高校基本科研业务费专项资金
详细信息
    作者简介:

    黄新林:男,1985年生,教授,博士生导师,研究方向为机器学习与智能通信

    郑人华:男,1996年生,硕士生,研究方向为强化学习与智能通信

    通讯作者:

    郑人华 471539350@qq.com

  • 中图分类号: TN915; TP393

802.11ax Uplink Scheduling Algorithm Based on Reinforcement Learning

Funds: The National Natural Science Foundation of China (62071332), Shanghai Rising-Star Program (19QA1409100), The Fundamental Research Funds for the Central Universities
  • 摘要: 随着物联网(IoT)时代的到来,无线网络饱和的问题已经越来越严重。为了克服终端密集接入问题,IEEE标准协会(IEEE-SA)制定了无线局域网的最新标准—IEEE 802.11ax。该标准使用正交频分多址(OFDMA)技术对无线信道资源进行了更细致的划分,划分出的子信道被称为资源单元(RU)。为解决密集用户环境下802.11ax 上行链路的信道资源调度问题,该文提出一种基于强化学习的RU调度算法。该算法使用演员-评论家(Actor-Critic)算法训练指针网络,解决了自适应RU调度问题,最终合理分配RU资源给各用户,兼具优先级和公平性的保障。仿真结果表明,该调度算法在IEEE 802.11ax上行链路中比传统的调度方式更有效,具有较强的泛化能力,适合应用在密集用户环境下的物联网场景中。
  • 图  1  使用各种大小的RU划分20 MHz的信道

    图  2  基于OFDMA的802.11ax上行链路调度接入过程

    图  3  指针网络结构图

    图  4  本文算法的吞吐量随时间变化的仿真结果

    图  5  4种算法下STA1和STA63的吞吐量随时间变化的仿真结果

    图  6  4种算法上行链路数据流总价值随时间变化的仿真结果

    图  7  4种算法上行链路数据流平均总价值与STA数量的关系

    表  1  QoS值与业务类型对应关系

    QoS业务类型
    1探测请求、火灾报警、交通事故报警等
    2患者监测、工业设备监测等
    3智能家居、智慧农业、仓储管理等
    4监控视频、智能水表、智能电表等
    5信道质量指示符、无线电测量服务等
    下载: 导出CSV

    表  2  不同MCS与不同RU大小情况下的数据传输速率(Mbps)

    MCS索引MCS26 tones52 tones106 tones242 tones484 tones996 tones
    1BPSK, 1/20.81.73.58.146.334.0
    2QPSK, 1/21.73.37.116.332.568.1
    3QPSK, 3/42.55.010.624.448.8102.1
    416-QAM, 1/23.36.714.232.565.0136.1
    516-QAM, 3/45.010.021.348.897.5204.2
    664-QAM, 2/36.713.328.365.0130.0272.2
    764-QAM, 3/47.515.031.973.1146.3306.3
    864-QAM, 5/68.316.735.481.3162.5340.3
    9256-QAM, 3/410.020.042.597.5195.0408.3
    10256-QAM, 5/611.122.247.2108.3216.7453.7
    111024-QAM, 3/4121.9243.8510.4
    下载: 导出CSV

    表  3  Actor-Critic算法训练指针网络的过程

     (1) 初始化超参数,初始化训练集$ {C^{{\text{in}}}} $,设置训练总步长$ T $,设置
       批次数$ N $
     (2) 初始化指针网络参数$ \theta $
     (3) 初始化Critic网络参数$ {\theta _v} $
     (4) for t = 1 to $ T $:
     (5) 从训练集中获取输入:
     ${c_i}{ {\sim {\rm{SampleInput} }(} }{C^{ {\text{in} } } }){\text{ for } }i \in \{ 1,2,\cdots,N\}$
     (6)   使用$ \theta $选出物品子集:
     ${\pi _i}\sim{\text{SampleSolution(} }{p_\theta }(.|{c_i}){\text{) for } }i \in \{ 1,2,\cdots,N\}$
     (7)   使用$ {\theta _v} $计算基线值:
         $b({c_i}) = {b_{ {\theta _v} } }({{\boldsymbol{c}}_i}){\text{ for } }i \in \{ 1,2,\cdots,N\}$
     (8)   计算Actor目标函数的梯度:
         ${{\text{∇}}_\theta }J(\theta ) = \dfrac{1}{N}\displaystyle\sum\limits_{i = 1}^N ( V({\pi _i}|{{\boldsymbol{c}}_i}) - b({c_i})){{\text{∇}}_\theta }\ln {p_\theta }({\pi _i}|{{\boldsymbol{c}}_i})$
     (9)   计算Critic的损失函数:
         $L({\theta _v}) = \frac{1}{N}\displaystyle\sum\limits_{i = 1}^N \parallel {b_{ {\theta _v} } }({{\boldsymbol{c}}_i}) - V({\pi _i}|{{\boldsymbol{c}}_i})\parallel _2^2$
     (10)   使用Adam优化器对参数$ \theta $进行更新:
         $\theta = {\text{Adam(} }\theta ,{{\text{∇}}_\theta }J(\theta ){\text{)} }$
     (11)    使用Adam优化器对参数$ {\theta _v} $进行更新:
         ${\theta _v} = {\text{Adam(} }{\theta _v},{{\text{∇}}_{ {\theta _v} } }L({\theta _v}){\text{)} }$
     (12) end
    下载: 导出CSV

    表  4  4种算法下5个STA代表的平均等待时间(ms)

    算法名STA1STA21STA41STA61STA81
    轮询算法8.738.838.738.609.01
    PRA算法5.427.3610.8713.8416.90
    自适应分组算法9.109.149.129.139.61
    本文算法4.495.657.979.3111.56
    下载: 导出CSV
  • [1] LEE J. OFDMA-based hybrid channel access for IEEE 802.11ax WLAN[C]. 2018 14th International Wireless Communications & Mobile Computing Conference (IWCMC), Limassol, Cyprus, 2018: 188–193.
    [2] BHATTARAI S, NAIK G, and PARK J M J. Uplink resource allocation in IEEE 802.11ax[C]. ICC 2019-2019 IEEE International Conference on Communications (ICC), Shanghai, China, 2019: 1–6.
    [3] PIRO G, GRIECO L A, BOGGIA G, et al. Two-level downlink scheduling for real-time multimedia services in LTE networks[J]. IEEE Transactions on Multimedia, 2011, 13(5): 1052–1065. doi: 10.1109/TMM.2011.2152381
    [4] SAFA H and TOHME K. LTE uplink scheduling algorithms: Performance and challenges[C]. 2012 19th International Conference on Telecommunications (ICT), Jounieh, Lebanon, 2012: 1–6.
    [5] KARTHIK R M and PALANISWAMY S. Resource unit (RU) based OFDMA scheduling in IEEE 802.11ax system[C]. 2018 International Conference on Advances in Computing, Communications and Informatics (ICACCI), Bangalore, India, 2018: 1297–1302.
    [6] BANKOV D, DIDENKO A, KHOROV E, et al. OFDMA uplink scheduling in IEEE 802.11ax Networks[C]. 2018 IEEE International Conference on Communications (ICC), Kansas City, USA, 2018: 1–6.
    [7] WANG Kaidong and PSOUNIS K. Scheduling and Resource Allocation in 802.11ax[C]. IEEE INFOCOM 2018-IEEE Conference on Computer Communications, Honolulu, USA, 2018: 279–287.
    [8] 唐伦, 贺小雨, 王晓, 等. 基于迁移演员-评论家学习的服务功能链部署算法[J]. 电子与信息学报, 2020, 42(11): 2671–2679. doi: 10.11999/JEIT190542

    TANG Lun, HE Xiaoyu, WANG Xiao, et al. Deployment algorithm of service function chain based on transfer actor-critic learning[J]. Journal of Electronics &Information Technology, 2020, 42(11): 2671–2679. doi: 10.11999/JEIT190542
    [9] AFAQUI M S, GARCIA-VILLEGAS E, and LOPEZ-AGUILERA E. IEEE 802.11ax: Challenges and requirements for future high efficiency WiFi[J]. IEEE Wireless Communications, 2017, 24(3): 130–137. doi: 10.1109/MWC.2016.1600089WC
    [10] MACHROUH Z and NAJID A. High efficiency WLANs IEEE 802.11ax performance evaluation[C]. 2018 International Conference on Control, Automation and Diagnosis (ICCAD), Marrakech, Morocco, 2018: 1–5.
    [11] ZHOU Hu, LI Bo, YAN Zhongjiang, et al. An OFDMA based multiple access protocol with QoS guarantee for next generation WLAN[C]. 2015 IEEE International Conference on Signal Processing, Communications and Computing (ICSPCC), Ningbo, China, 2015: 1–6.
    [12] FILOSO D G, KUBO R, HARA K, et al. Proportional-based resource allocation control with QoS adaptation for IEEE 802.11ax[C]. ICC 2020-2020 IEEE International Conference on Communications (ICC), Dublin, Ireland, 2020: 1–6.
    [13] BAI Jiyang, FANG He, SUH J, et al. An adaptive grouping scheme in ultra-dense IEEE 802.11ax network using buffer state report based two-stage mechanism[J]. China Communications, 2019, 16(9): 31–44. doi: 10.23919/JCC.2019.09.003
    [14] DUAN Ren, CHEN Xiaojiang, and XING Tianzhang. A QoS architecture for IOT[C]. 2011 International Conference on Internet of Things and 4th International Conference on Cyber, Physical and Social Computing, Dalian, China, 2011: 717–720.
    [15] VINYALS O, FORTUNATO M, and JAITLY N. Pointer networks[J]. arXiv: 1506.03134, 2015.
    [16] BELLO I, PHAM H, LE Q V, et al. Neural combinatorial optimization with reinforcement learning[J]. arXiv: 1611.09940, 2017.
    [17] 李晨溪, 曹雷, 陈希亮, 等. 基于云推理模型的深度强化学习探索策略研究[J]. 电子与信息学报, 2018, 40(1): 244–248. doi: 10.11999/JEIT170347

    LI Chenxi, CAO Lei, CHEN Xiliang, et al. Cloud reasoning model-based exploration for deep reinforcement learning[J]. Journal of Electronics &Information Technology, 2018, 40(1): 244–248. doi: 10.11999/JEIT170347
  • 加载中
图(7) / 表(4)
计量
  • 文章访问数:  617
  • HTML全文浏览量:  385
  • PDF下载量:  106
  • 被引次数: 0
出版历程
  • 收稿日期:  2021-06-17
  • 修回日期:  2022-01-16
  • 录用日期:  2022-01-14
  • 网络出版日期:  2022-02-02
  • 刊出日期:  2022-05-25

目录

    /

    返回文章
    返回