802.11ax Uplink Scheduling Algorithm Based on Reinforcement Learning
-
摘要: 随着物联网(IoT)时代的到来,无线网络饱和的问题已经越来越严重。为了克服终端密集接入问题,IEEE标准协会(IEEE-SA)制定了无线局域网的最新标准—IEEE 802.11ax。该标准使用正交频分多址(OFDMA)技术对无线信道资源进行了更细致的划分,划分出的子信道被称为资源单元(RU)。为解决密集用户环境下802.11ax 上行链路的信道资源调度问题,该文提出一种基于强化学习的RU调度算法。该算法使用演员-评论家(Actor-Critic)算法训练指针网络,解决了自适应RU调度问题,最终合理分配RU资源给各用户,兼具优先级和公平性的保障。仿真结果表明,该调度算法在IEEE 802.11ax上行链路中比传统的调度方式更有效,具有较强的泛化能力,适合应用在密集用户环境下的物联网场景中。
-
关键词:
- 物联网 /
- IEEE 802.11ax /
- 强化学习 /
- 上行链路 /
- 演员-评论家
Abstract: With the arrival of the Internet of Things (IoT) era, the problem of wireless network saturation has become more and more serious. In order to overcome this problem, the IEEE Standards Association (IEEE-SA) has formulated the latest standard for wireless local area networks—IEEE 802.11ax. In this standard, the Orthogonal Frequency Division Multiple Access (OFDMA) technology is utilized to divide wireless channel into several groups of tones, and the divided sub-channels are called Resource Units (RUs). In order to solve the channel resource scheduling problem of 802.11ax uplink in dense user environments, an RU scheduling algorithm based on reinforcement learning is proposed in this paper. The Actor-Critic algorithm is used to train the pointer network and solve the adaptive allocation problem of RU. Finally, RUs are allocated to each user reasonably with the guarantee of priority and fairness. The simulation results show that the scheduling algorithm is more effective than traditional scheduling methods in the IEEE 802.11ax uplink and has a strong generalization ability, which is suitable for the IoT scenario in dense user environments.-
Key words:
- Internet of Things (IoT) /
- IEEE 802.11ax /
- Reinforcement learning /
- Uplink /
- Actor-Critic
-
表 1 QoS值与业务类型对应关系
QoS 业务类型 1 探测请求、火灾报警、交通事故报警等 2 患者监测、工业设备监测等 3 智能家居、智慧农业、仓储管理等 4 监控视频、智能水表、智能电表等 5 信道质量指示符、无线电测量服务等 表 2 不同MCS与不同RU大小情况下的数据传输速率(Mbps)
MCS索引 MCS 26 tones 52 tones 106 tones 242 tones 484 tones 996 tones 1 BPSK, 1/2 0.8 1.7 3.5 8.1 46.3 34.0 2 QPSK, 1/2 1.7 3.3 7.1 16.3 32.5 68.1 3 QPSK, 3/4 2.5 5.0 10.6 24.4 48.8 102.1 4 16-QAM, 1/2 3.3 6.7 14.2 32.5 65.0 136.1 5 16-QAM, 3/4 5.0 10.0 21.3 48.8 97.5 204.2 6 64-QAM, 2/3 6.7 13.3 28.3 65.0 130.0 272.2 7 64-QAM, 3/4 7.5 15.0 31.9 73.1 146.3 306.3 8 64-QAM, 5/6 8.3 16.7 35.4 81.3 162.5 340.3 9 256-QAM, 3/4 10.0 20.0 42.5 97.5 195.0 408.3 10 256-QAM, 5/6 11.1 22.2 47.2 108.3 216.7 453.7 11 1024-QAM, 3/4 – – – 121.9 243.8 510.4 表 3 Actor-Critic算法训练指针网络的过程
(1) 初始化超参数,初始化训练集$ {C^{{\text{in}}}} $,设置训练总步长$ T $,设置
批次数$ N $(2) 初始化指针网络参数$ \theta $ (3) 初始化Critic网络参数$ {\theta _v} $ (4) for t = 1 to $ T $: (5) 从训练集中获取输入: ${c_i}{ {\sim {\rm{SampleInput} }(} }{C^{ {\text{in} } } }){\text{ for } }i \in \{ 1,2,\cdots,N\}$ (6) 使用$ \theta $选出物品子集:
${\pi _i}\sim{\text{SampleSolution(} }{p_\theta }(.|{c_i}){\text{) for } }i \in \{ 1,2,\cdots,N\}$(7) 使用$ {\theta _v} $计算基线值:
$b({c_i}) = {b_{ {\theta _v} } }({{\boldsymbol{c}}_i}){\text{ for } }i \in \{ 1,2,\cdots,N\}$(8) 计算Actor目标函数的梯度:
${{\text{∇}}_\theta }J(\theta ) = \dfrac{1}{N}\displaystyle\sum\limits_{i = 1}^N ( V({\pi _i}|{{\boldsymbol{c}}_i}) - b({c_i})){{\text{∇}}_\theta }\ln {p_\theta }({\pi _i}|{{\boldsymbol{c}}_i})$
(9) 计算Critic的损失函数:
$L({\theta _v}) = \frac{1}{N}\displaystyle\sum\limits_{i = 1}^N \parallel {b_{ {\theta _v} } }({{\boldsymbol{c}}_i}) - V({\pi _i}|{{\boldsymbol{c}}_i})\parallel _2^2$
(10) 使用Adam优化器对参数$ \theta $进行更新: $\theta = {\text{Adam(} }\theta ,{{\text{∇}}_\theta }J(\theta ){\text{)} }$ (11) 使用Adam优化器对参数$ {\theta _v} $进行更新: ${\theta _v} = {\text{Adam(} }{\theta _v},{{\text{∇}}_{ {\theta _v} } }L({\theta _v}){\text{)} }$ (12) end 表 4 4种算法下5个STA代表的平均等待时间(ms)
算法名 STA1 STA21 STA41 STA61 STA81 轮询算法 8.73 8.83 8.73 8.60 9.01 PRA算法 5.42 7.36 10.87 13.84 16.90 自适应分组算法 9.10 9.14 9.12 9.13 9.61 本文算法 4.49 5.65 7.97 9.31 11.56 -
[1] LEE J. OFDMA-based hybrid channel access for IEEE 802.11ax WLAN[C]. 2018 14th International Wireless Communications & Mobile Computing Conference (IWCMC), Limassol, Cyprus, 2018: 188–193. [2] BHATTARAI S, NAIK G, and PARK J M J. Uplink resource allocation in IEEE 802.11ax[C]. ICC 2019-2019 IEEE International Conference on Communications (ICC), Shanghai, China, 2019: 1–6. [3] PIRO G, GRIECO L A, BOGGIA G, et al. Two-level downlink scheduling for real-time multimedia services in LTE networks[J]. IEEE Transactions on Multimedia, 2011, 13(5): 1052–1065. doi: 10.1109/TMM.2011.2152381 [4] SAFA H and TOHME K. LTE uplink scheduling algorithms: Performance and challenges[C]. 2012 19th International Conference on Telecommunications (ICT), Jounieh, Lebanon, 2012: 1–6. [5] KARTHIK R M and PALANISWAMY S. Resource unit (RU) based OFDMA scheduling in IEEE 802.11ax system[C]. 2018 International Conference on Advances in Computing, Communications and Informatics (ICACCI), Bangalore, India, 2018: 1297–1302. [6] BANKOV D, DIDENKO A, KHOROV E, et al. OFDMA uplink scheduling in IEEE 802.11ax Networks[C]. 2018 IEEE International Conference on Communications (ICC), Kansas City, USA, 2018: 1–6. [7] WANG Kaidong and PSOUNIS K. Scheduling and Resource Allocation in 802.11ax[C]. IEEE INFOCOM 2018-IEEE Conference on Computer Communications, Honolulu, USA, 2018: 279–287. [8] 唐伦, 贺小雨, 王晓, 等. 基于迁移演员-评论家学习的服务功能链部署算法[J]. 电子与信息学报, 2020, 42(11): 2671–2679. doi: 10.11999/JEIT190542TANG Lun, HE Xiaoyu, WANG Xiao, et al. Deployment algorithm of service function chain based on transfer actor-critic learning[J]. Journal of Electronics &Information Technology, 2020, 42(11): 2671–2679. doi: 10.11999/JEIT190542 [9] AFAQUI M S, GARCIA-VILLEGAS E, and LOPEZ-AGUILERA E. IEEE 802.11ax: Challenges and requirements for future high efficiency WiFi[J]. IEEE Wireless Communications, 2017, 24(3): 130–137. doi: 10.1109/MWC.2016.1600089WC [10] MACHROUH Z and NAJID A. High efficiency WLANs IEEE 802.11ax performance evaluation[C]. 2018 International Conference on Control, Automation and Diagnosis (ICCAD), Marrakech, Morocco, 2018: 1–5. [11] ZHOU Hu, LI Bo, YAN Zhongjiang, et al. An OFDMA based multiple access protocol with QoS guarantee for next generation WLAN[C]. 2015 IEEE International Conference on Signal Processing, Communications and Computing (ICSPCC), Ningbo, China, 2015: 1–6. [12] FILOSO D G, KUBO R, HARA K, et al. Proportional-based resource allocation control with QoS adaptation for IEEE 802.11ax[C]. ICC 2020-2020 IEEE International Conference on Communications (ICC), Dublin, Ireland, 2020: 1–6. [13] BAI Jiyang, FANG He, SUH J, et al. An adaptive grouping scheme in ultra-dense IEEE 802.11ax network using buffer state report based two-stage mechanism[J]. China Communications, 2019, 16(9): 31–44. doi: 10.23919/JCC.2019.09.003 [14] DUAN Ren, CHEN Xiaojiang, and XING Tianzhang. A QoS architecture for IOT[C]. 2011 International Conference on Internet of Things and 4th International Conference on Cyber, Physical and Social Computing, Dalian, China, 2011: 717–720. [15] VINYALS O, FORTUNATO M, and JAITLY N. Pointer networks[J]. arXiv: 1506.03134, 2015. [16] BELLO I, PHAM H, LE Q V, et al. Neural combinatorial optimization with reinforcement learning[J]. arXiv: 1611.09940, 2017. [17] 李晨溪, 曹雷, 陈希亮, 等. 基于云推理模型的深度强化学习探索策略研究[J]. 电子与信息学报, 2018, 40(1): 244–248. doi: 10.11999/JEIT170347LI Chenxi, CAO Lei, CHEN Xiliang, et al. Cloud reasoning model-based exploration for deep reinforcement learning[J]. Journal of Electronics &Information Technology, 2018, 40(1): 244–248. doi: 10.11999/JEIT170347