基于强化学习的802.11ax上行链路调度算法

黄新林; 郑人华

doi:10.11999/JEIT210590

基于强化学习的802.11ax上行链路调度算法

doi: 10.11999/JEIT210590 cstr: 32379.14.JEIT210590

黄新林,
郑人华^,

同济大学电子与信息工程学院上海 201800

基金项目: 国家自然科学基金(62071332)，上海市青年科技启明星计划(19QA1409100)，中央高校基本科研业务费专项资金

详细信息

作者简介:
黄新林：男，1985年生，教授，博士生导师，研究方向为机器学习与智能通信

郑人华：男，1996年生，硕士生，研究方向为强化学习与智能通信

通讯作者:
郑人华　471539350@qq.com

中图分类号: TN915; TP393
计量
- 文章访问数: 1071
- HTML全文浏览量: 754
- PDF下载量: 121
- 被引次数: 0
出版历程
- 收稿日期: 2021-06-17
- 修回日期: 2022-01-16
- 录用日期: 2022-01-14
- 网络出版日期: 2022-02-02
- 刊出日期: 2022-05-25

802.11ax Uplink Scheduling Algorithm Based on Reinforcement Learning

HUANG Xinlin,
ZHENG Renhua^,

College of Electronic and Information Engineering, Tongji University, Shanghai 201800, China

Funds: The National Natural Science Foundation of China (62071332), Shanghai Rising-Star Program (19QA1409100), The Fundamental Research Funds for the Central Universities

摘要

摘要: 随着物联网(IoT)时代的到来，无线网络饱和的问题已经越来越严重。为了克服终端密集接入问题，IEEE标准协会(IEEE-SA)制定了无线局域网的最新标准—IEEE 802.11ax。该标准使用正交频分多址(OFDMA)技术对无线信道资源进行了更细致的划分，划分出的子信道被称为资源单元(RU)。为解决密集用户环境下802.11ax 上行链路的信道资源调度问题，该文提出一种基于强化学习的RU调度算法。该算法使用演员-评论家(Actor-Critic)算法训练指针网络，解决了自适应RU调度问题，最终合理分配RU资源给各用户，兼具优先级和公平性的保障。仿真结果表明，该调度算法在IEEE 802.11ax上行链路中比传统的调度方式更有效，具有较强的泛化能力，适合应用在密集用户环境下的物联网场景中。
- 物联网 /
- IEEE 802.11ax /
- 强化学习 /
- 上行链路 /
- 演员-评论家
Abstract: With the arrival of the Internet of Things (IoT) era, the problem of wireless network saturation has become more and more serious. In order to overcome this problem, the IEEE Standards Association (IEEE-SA) has formulated the latest standard for wireless local area networks—IEEE 802.11ax. In this standard, the Orthogonal Frequency Division Multiple Access (OFDMA) technology is utilized to divide wireless channel into several groups of tones, and the divided sub-channels are called Resource Units (RUs). In order to solve the channel resource scheduling problem of 802.11ax uplink in dense user environments, an RU scheduling algorithm based on reinforcement learning is proposed in this paper. The Actor-Critic algorithm is used to train the pointer network and solve the adaptive allocation problem of RU. Finally, RUs are allocated to each user reasonably with the guarantee of priority and fairness. The simulation results show that the scheduling algorithm is more effective than traditional scheduling methods in the IEEE 802.11ax uplink and has a strong generalization ability, which is suitable for the IoT scenario in dense user environments.
- Internet of Things (IoT) /
- IEEE 802.11ax /
- Reinforcement learning /
- Uplink /
- Actor-Critic

HTML全文

图 1 使用各种大小的RU划分20 MHz的信道

下载: 全尺寸图片幻灯片

图 2 基于OFDMA的802.11ax上行链路调度接入过程

下载: 全尺寸图片幻灯片

图 3 指针网络结构图

下载: 全尺寸图片幻灯片

图 4 本文算法的吞吐量随时间变化的仿真结果

下载: 全尺寸图片幻灯片

图 5 4种算法下STA₁和STA₆₃的吞吐量随时间变化的仿真结果

下载: 全尺寸图片幻灯片

图 6 4种算法上行链路数据流总价值随时间变化的仿真结果

下载: 全尺寸图片幻灯片

图 7 4种算法上行链路数据流平均总价值与STA数量的关系

下载: 全尺寸图片幻灯片

表 1 QoS值与业务类型对应关系

QoS	业务类型
1	探测请求、火灾报警、交通事故报警等
2	患者监测、工业设备监测等
3	智能家居、智慧农业、仓储管理等
4	监控视频、智能水表、智能电表等
5	信道质量指示符、无线电测量服务等

下载: 导出CSV

表 2 不同MCS与不同RU大小情况下的数据传输速率(Mbps)

MCS索引	MCS	26 tones	52 tones	106 tones	242 tones	484 tones	996 tones
1	BPSK, 1/2	0.8	1.7	3.5	8.1	46.3	34.0
2	QPSK, 1/2	1.7	3.3	7.1	16.3	32.5	68.1
3	QPSK, 3/4	2.5	5.0	10.6	24.4	48.8	102.1
4	16-QAM, 1/2	3.3	6.7	14.2	32.5	65.0	136.1
5	16-QAM, 3/4	5.0	10.0	21.3	48.8	97.5	204.2
6	64-QAM, 2/3	6.7	13.3	28.3	65.0	130.0	272.2
7	64-QAM, 3/4	7.5	15.0	31.9	73.1	146.3	306.3
8	64-QAM, 5/6	8.3	16.7	35.4	81.3	162.5	340.3
9	256-QAM, 3/4	10.0	20.0	42.5	97.5	195.0	408.3
10	256-QAM, 5/6	11.1	22.2	47.2	108.3	216.7	453.7
11	1024-QAM, 3/4	–	–	–	121.9	243.8	510.4

下载: 导出CSV

表 3 Actor-Critic算法训练指针网络的过程

(1) 初始化超参数，初始化训练集$ {C^{{\text{in}}}} $，设置训练总步长$ T $，设置　　批次数$ N $
(2) 初始化指针网络参数$ \theta $
(3) 初始化Critic网络参数$ {\theta _v} $
(4) for t = 1 to $ T $：
(5) 从训练集中获取输入：
${c_i}{ {\sim {\rm{SampleInput} }(} }{C^{ {\text{in} } } }){\text{ for } }i \in \{ 1,2,\cdots,N\}$
(6) 　　使用$ \theta $选出物品子集：　${\pi _i}\sim{\text{SampleSolution(} }{p_\theta }(.\|{c_i}){\text{) for } }i \in \{ 1,2,\cdots,N\}$
(7) 　　使用$ {\theta _v} $计算基线值：　　　　 $b({c_i}) = {b_{ {\theta _v} } }({{\boldsymbol{c}}_i}){\text{ for } }i \in \{ 1,2,\cdots,N\}$
(8) 　　计算Actor目标函数的梯度：　　　　${{\text{∇}}_\theta }J(\theta ) = \dfrac{1}{N}\displaystyle\sum\limits_{i = 1}^N ( V({\pi _i}\|{{\boldsymbol{c}}_i}) - b({c_i})){{\text{∇}}_\theta }\ln {p_\theta }({\pi _i}\|{{\boldsymbol{c}}_i})$
(9) 　　计算Critic的损失函数：　　　　$L({\theta _v}) = \frac{1}{N}\displaystyle\sum\limits_{i = 1}^N \parallel {b_{ {\theta _v} } }({{\boldsymbol{c}}_i}) - V({\pi _i}\|{{\boldsymbol{c}}_i})\parallel _2^2$
(10) 　　使用Adam优化器对参数$ \theta $进行更新：
$\theta = {\text{Adam(} }\theta ,{{\text{∇}}_\theta }J(\theta ){\text{)} }$
(11) 　　　使用Adam优化器对参数$ {\theta _v} $进行更新：
${\theta _v} = {\text{Adam(} }{\theta _v},{{\text{∇}}_{ {\theta _v} } }L({\theta _v}){\text{)} }$
(12) end

下载: 导出CSV

表 4 4种算法下5个STA代表的平均等待时间(ms)

算法名	STA₁	STA₂₁	STA₄₁	STA₆₁	STA₈₁
轮询算法	8.73	8.83	8.73	8.60	9.01
PRA算法	5.42	7.36	10.87	13.84	16.90
自适应分组算法	9.10	9.14	9.12	9.13	9.61
本文算法	4.49	5.65	7.97	9.31	11.56

下载: 导出CSV

参考文献(17)

[1]	LEE J. OFDMA-based hybrid channel access for IEEE 802.11ax WLAN[C]. 2018 14th International Wireless Communications & Mobile Computing Conference (IWCMC), Limassol, Cyprus, 2018: 188–193.
[2]	BHATTARAI S, NAIK G, and PARK J M J. Uplink resource allocation in IEEE 802.11ax[C]. ICC 2019-2019 IEEE International Conference on Communications (ICC), Shanghai, China, 2019: 1–6.
[3]	PIRO G, GRIECO L A, BOGGIA G, et al. Two-level downlink scheduling for real-time multimedia services in LTE networks[J]. IEEE Transactions on Multimedia, 2011, 13(5): 1052–1065. doi: 10.1109/TMM.2011.2152381
[4]	SAFA H and TOHME K. LTE uplink scheduling algorithms: Performance and challenges[C]. 2012 19th International Conference on Telecommunications (ICT), Jounieh, Lebanon, 2012: 1–6.
[5]	KARTHIK R M and PALANISWAMY S. Resource unit (RU) based OFDMA scheduling in IEEE 802.11ax system[C]. 2018 International Conference on Advances in Computing, Communications and Informatics (ICACCI), Bangalore, India, 2018: 1297–1302.
[6]	BANKOV D, DIDENKO A, KHOROV E, et al. OFDMA uplink scheduling in IEEE 802.11ax Networks[C]. 2018 IEEE International Conference on Communications (ICC), Kansas City, USA, 2018: 1–6.
[7]	WANG Kaidong and PSOUNIS K. Scheduling and Resource Allocation in 802.11ax[C]. IEEE INFOCOM 2018-IEEE Conference on Computer Communications, Honolulu, USA, 2018: 279–287.
[8]	唐伦, 贺小雨, 王晓, 等. 基于迁移演员-评论家学习的服务功能链部署算法[J]. 电子与信息学报, 2020, 42(11): 2671–2679. doi: 10.11999/JEIT190542 TANG Lun, HE Xiaoyu, WANG Xiao, et al. Deployment algorithm of service function chain based on transfer actor-critic learning[J]. Journal of Electronics &Information Technology, 2020, 42(11): 2671–2679. doi: 10.11999/JEIT190542
[9]	AFAQUI M S, GARCIA-VILLEGAS E, and LOPEZ-AGUILERA E. IEEE 802.11ax: Challenges and requirements for future high efficiency WiFi[J]. IEEE Wireless Communications, 2017, 24(3): 130–137. doi: 10.1109/MWC.2016.1600089WC
[10]	MACHROUH Z and NAJID A. High efficiency WLANs IEEE 802.11ax performance evaluation[C]. 2018 International Conference on Control, Automation and Diagnosis (ICCAD), Marrakech, Morocco, 2018: 1–5.
[11]	ZHOU Hu, LI Bo, YAN Zhongjiang, et al. An OFDMA based multiple access protocol with QoS guarantee for next generation WLAN[C]. 2015 IEEE International Conference on Signal Processing, Communications and Computing (ICSPCC), Ningbo, China, 2015: 1–6.
[12]	FILOSO D G, KUBO R, HARA K, et al. Proportional-based resource allocation control with QoS adaptation for IEEE 802.11ax[C]. ICC 2020-2020 IEEE International Conference on Communications (ICC), Dublin, Ireland, 2020: 1–6.
[13]	BAI Jiyang, FANG He, SUH J, et al. An adaptive grouping scheme in ultra-dense IEEE 802.11ax network using buffer state report based two-stage mechanism[J]. China Communications, 2019, 16(9): 31–44. doi: 10.23919/JCC.2019.09.003
[14]	DUAN Ren, CHEN Xiaojiang, and XING Tianzhang. A QoS architecture for IOT[C]. 2011 International Conference on Internet of Things and 4th International Conference on Cyber, Physical and Social Computing, Dalian, China, 2011: 717–720.
[15]	VINYALS O, FORTUNATO M, and JAITLY N. Pointer networks[J]. arXiv: 1506.03134, 2015.
[16]	BELLO I, PHAM H, LE Q V, et al. Neural combinatorial optimization with reinforcement learning[J]. arXiv: 1611.09940, 2017.
[17]	李晨溪, 曹雷, 陈希亮, 等. 基于云推理模型的深度强化学习探索策略研究[J]. 电子与信息学报, 2018, 40(1): 244–248. doi: 10.11999/JEIT170347 LI Chenxi, CAO Lei, CHEN Xiliang, et al. Cloud reasoning model-based exploration for deep reinforcement learning[J]. Journal of Electronics &Information Technology, 2018, 40(1): 244–248. doi: 10.11999/JEIT170347