Joint Routing and Resource Scheduling Algorithm for Large-scale Multi-mode Mesh Networks Based on Reinforcement Learning
-
摘要: 为了平衡新型电力系统中大规模多模Mesh网络的传输可靠性和效率,该文在对优化问题进行描述和分析的基础上提出一种基于强化学习的大规模多模Mesh网络联合路由选择及资源调度算法,分为两个阶段。在第1阶段中,根据网络拓扑结构信息和业务需求,利用一种多条最短路径路由算法,输出所有最短路径。在第2阶段中,提出一种基于多臂老虎机(MAB)的资源调度算法,该算法基于得到的最短路径集合构建MAB的摇臂,然后根据业务需求计算回报,最终给出最优的路由选择及资源调度方式用于业务传输。仿真结果表明,所提算法能够满足不同的业务传输需求,实现端到端路径的平均时延和平均传输成功率的高效平衡。Abstract: In order to balance the transmission reliability and efficiency of large-scale multi-mode mesh networks in the new power system, a two-stage algorithm is proposed based on reinforcement learning for joint routing selection and resource scheduling in large-scale multi-mode mesh networks, building upon the description and analysis of optimization problems. In the first stage, based on the network topology information and service requirements, a multi shortest path routing algorithm is utilized to generate all the shortest paths. In the second stage, a resource scheduling algorithm based on Multi-Armed Bandit (MAB) is proposed. The algorithm constructs the arms of the MAB based on the obtained set of shortest paths, then calculates the reward according to the service demands, and finally gives the optimal route selection and resource scheduling mode for service transmission. Simulation results show that the proposed algorithm can meet different service transmission requirements, and achieve an efficient balance between the average end-to-end path delay and the average transmission success rate.
-
1 改进的宽度优先搜索算法
输入:网络拓扑邻接矩阵$ {\boldsymbol{A}} $,源节点$ n_{\mathrm{s}}^e $ (1) 将$ n_{\mathrm{s}}^e $放入队列$ {\mathrm{Queue}} $中;将$ n_{\mathrm{s}}^e $的$ {\rm{Visited}} $值设置为$ {\rm{True}} $,而其余节点的该值设置为$ {\mathrm{False}} $;将$ n_{\mathrm{s}}^e $到源节点的最短跳数$ {\rm{HopCount}}(n_{\mathrm{s}}^e) $值
设置为0,而其余节点的该值设置为无穷大;将所有节点的$ {\mathrm{Searched}} $值设置为$ {\mathrm{False}} $;将每个节点$ n $的前置节点数量$ {\rm{FrontCount}}(n) $值设
置为0,表示所有网络节点的前置节点集合$ {\rm{FrontPoint}}(n) $中前置节点数量为0(2) WHILE $ {\mathrm{Queue}} $不是空集 DO (3) 获取$ {\mathrm{Queue}} $的队首节点$ {\rm{Head}} $,遍历$ {\boldsymbol{A}} $,找出$ {\rm{Head}} $的所有邻居节点$ {\rm{Neighbo}}{{\mathrm{r}}_z},z = 1,2,\cdots,Z $ (4) FOR $ {\rm{Neighbo}}{{\mathrm{r}}_z},z = 1,2,\cdots,Z $ DO (5) IF $ {\rm{Neighbo}}{r_z} $的$ {\mathrm{Searched}} $值为$ {\rm{True}} $ THEN 跳出本次循环 (6) ELSE IF $ {\rm{Neighbo}}{{\mathrm{r}}_z} $的$ {\rm{Visited}} $值为$ {\mathrm{False}} $ THEN (7) 令$ {\rm{HopCount}}({\rm{Neighbo}}{{\mathrm{r}}_z}) = {\rm{HopCount}}({\rm{Head}}) + 1 $;将$ {\rm{FrontCount}}({\rm{Neighbo}}{{\mathrm{r}}_z}) $值加1 把$ {\rm{Head}} $存入$ {\rm{FrontPoint}}({\rm{Neighbo}}{{\mathrm{r}}_z}) $中;把$ {\rm{Neighbo}}{{\mathrm{r}}_z} $存入$ {\mathrm{Queue}} $中,并将其$ {\rm{Visited}} $值设置为$ {\rm{True}} $ (8) ELSE $ {\rm{Neighbo}}{{\mathrm{r}}_z} $的$ {\rm{Visited}} $值为$ {\rm{True}} $ THEN (9) IF $ {\rm{HopCount}}({\rm{Head}}) + 1 \lt {\rm{HopCount}}({\rm{Neighbo}}{{\mathrm{r}}_z}) $ THEN (10) 令$ {\rm{HopCount}}({\rm{Neighbo}}{{\mathrm{r}}_z}) = {\rm{HopCount}}({\rm{Head}}) + 1 $;$ {\rm{FrontCount}}({\rm{Neighbo}}{{\mathrm{r}}_z}) $值保持不变 把$ {\rm{FrontPoint}}({\rm{Neighbo}}{{\mathrm{r}}_z}) $中最近一个存入的元素替换为$ {\rm{Head}} $ (11) ELSE IF $ {\rm{HopCount}}({\rm{Head}}) + 1 = {\rm{HopCount}}({\rm{Neighbo}}{{\mathrm{r}}_z}) $ THEN (12) 将$ {\rm{FrontCount}}({\rm{Neighbo}}{{\mathrm{r}}_z}) $值加1;把$ {\rm{Head}} $存入$ {\rm{FrontPoint}}({\rm{Neighbo}}{{\mathrm{r}}_z}) $中 (13) ELSE 跳出本次循环 (14) END IF (15) END IF (16) END FOR (17) 将当前$ {\rm{Head}} $的$ {\mathrm{Searched}} $值设置为$ {\rm{True}} $,并将其移出$ {\mathrm{Queue}} $ (18) END WHILE (19) 按如下方式遍历设置$ {{\boldsymbol{A}}^{\mathrm{s}}}(a_{ij}^{\mathrm{s}}) $中每个元素$ a_{ij}^{\mathrm{s}} $的值:$ {a}_{ij}^{{\mathrm{s}}}=\left\{\begin{aligned}& 1,j\in {\rm{FrontPoint}}(i) \\& 0,其他 \end{aligned},i,j\in \mathcal{N}\right. $ 输出:最短路径邻接矩阵$ {{\boldsymbol{A}}^{\mathrm{s}}} $ 2 多条最短路径输出算法
输入:最短路径邻接矩阵$ {{\boldsymbol{A}}^s} $,源节点$ n_{\mathrm{s}}^e $和目的节点$ n_{\mathrm{d}}^e $ (1) 将$ n_{\mathrm{s}}^e $压入主栈$ {{\rm{Stack}}_{\rm{main}}} $;遍历$ {{\boldsymbol{A}}^s} $,将$ n_{\mathrm{s}}^e $的邻居节点存入邻居节点列表$ {\rm{Array}} $,然后将$ {\rm{Array}} $作为栈顶压入辅栈$ {{\rm{Stack}}_{\rm{assist}}} $ (2) WHILE $ {{\rm{Stack}}_{\rm{main}}} $不是空集 DO (3) 获取$ {{\rm{Stack}}_{\rm{assist}}} $栈顶,作为新的$ {\rm{Array}} $ (4) IF $ {\rm{Array}} $非空 THEN (5) 获取$ {\rm{Array}} $中的首个元素,将其压入$ {{\rm{Stack}}_{\rm{main}}} $,并将剩余元素构成的列表重新压入$ {{\rm{Stack}}_{\rm{assist}}} $ (6) 查询栈顶元素的$ {\rm{Array}} $,将$ {{\rm{Stack}}_{\rm{main}}} $中包含的元素从其中剔除,再将其压入$ {{\rm{Stack}}_{\rm{assist}}} $ (7) ELSE 将$ {{\rm{Stack}}_{\rm{main}}} $和$ {{\rm{Stack}}_{\rm{assist}}} $的栈顶元素弹出 (8) END IF (9) IF $ {{\rm{Stack}}_{\rm{main}}} $的栈顶元素与$ n_{\mathrm{d}}^e $相等 THEN 将最短路径$ p_u^e = {{\rm{Stack}}_{\rm{main}}} $存入$ {{\mathrm{Path}}^e} $;将$ {{\rm{Stack}}_{\rm{main}}} $和$ {{\rm{Stack}}_{\rm{assist}}} $的栈顶元素弹出 (10) END IF (11) END WHILE 输出:最短路径集合$ {{\mathrm{Path}}^e} = \left\{ {p_1^e,p_2^e,\cdots ,p_U^e} \right\} $ 3 基于MAB的资源调度选择算法
输入:摇臂集合$ {{\mathrm{Arm}}^e} $,最大迭代次数$ {T_{\max}} $ (1) 令$ Q({\bf{arm}}_k^e) = 0 $,$ {\mathrm{count}}({\bf{arm}}_k^e) = 0 $ (2) FOR $ t = 1,2,\cdots,{T_{\max}} $ DO (3) 根据式(16)更新$ \epsilon(t) $,然后按如下方式选择摇臂$ {\mathrm{arm}}_k^e $:$ {{\mathrm{arm}}}_{k}^{e}=\left\{\begin{aligned}& 从{{\bf{arm}}}_{1}^{e},{{\bf{arm}}}_{2}^{e},\cdots ,{{\bf{arm}}}_{K}^{e}中以均匀分布随机选取,以\varepsilon(t)的概率 \\ & {\underset{{{\bf{arm}}}_{k}^{e}}{{\bf{argmax}}}}{Q}_{t}({{\bf{arm}}}_{k}^{e}),以1-\varepsilon(t)的概率 \end{aligned} \right. $ (4) 令$ {\mathrm{count}}({\bf{arm}}_k^e) = {\mathrm{count}}({\bf{arm}}_k^e) + 1 $ (5) 根据式(14)计算$ {R_t}({\bf{arm}}_k^e) $,然后根据式(15)更新
$ {Q_t}({\bf{arm}}_k^e) $(6) END FOR
输出:最佳摇臂$ {\bf{arm}}_{{\mathrm{best}}}^e = \mathop {{\mathrm{argmax}}}\limits_{{\bf{arm}}_k^e} Q({\bf{arm}}_k^e) $表 1 相关仿真参数设置
参数 最大迭代次数
Tmax探索概率$ \varepsilon(t) $
初始值$ {\varepsilon}_{{\mathrm{init}}} $衰减系数$ \chi $ 业务包
数据量$ {D^e} $传码率$ {R_{\mathrm{B}}} $ 路由汇聚节点
排队时延$ T_j^{e,{\mathrm{wait}}} $时延重要程度$ \alpha $,
成功率重要程度$ \beta $数值 2 × K 0.99 6 600 bit 115 200 1~3 ms的随机值 0.5 -
[1] BEDI G, VENAYAGAMOORTHY G K, SINGH R, et al. Review of Internet of Things (IoT) in electric power and energy systems[J]. IEEE Internet of Things Journal, 2018, 5(2): 847–870. doi: 10.1109/JIOT.2018.2802704. [2] TALEB S M, MERAIHI Y, GABIS A B, et al. Nodes placement in wireless mesh networks using optimization approaches: A survey[J]. Neural Computing and Applications, 2022, 34(7): 5283–5319. doi: 10.1007/s00521–022-06941-y. [3] ALOTAIBI E and MUKHERJEE B. A survey on routing algorithms for wireless ad-hoc and mesh networks[J]. Computer Networks, 2012, 56(2): 940–965. doi: 10.1016/j.comnet.2011.10.011. [4] WANG Lei, ZHANG Lianfang, SHU Yantai, et al. Multipath source routing in wireless ad hoc networks[C]. 2000 Canadian Conference on Electrical and Computer Engineering. Conference Proceedings. Navigating to a New Era (Cat. No. 00TH8492), Halifax, Canada, 2000: 479–483. doi: 10.1109/CCECE.2000.849755. [5] GUO Xiaoyuan, WANG Feng, LIU Jiangchuan, et al. Path diversified multi-QoS optimization in multi-channel wireless mesh networks[J]. Wireless Networks, 2014, 20(6): 1583–1596. doi: 10.1007/s11276-014-0698-x. [6] JIA Dongyao, ZOU Shengxiong, LI Meng, et al. Adaptive multi-path routing based on an improved leapfrog algorithm[J]. Information Sciences, 2016, 367/368: 615–629. doi: 10.1016/j.ins.2016.07.021. [7] SUN Yaohua, PENG Mugen, ZHOU Yangcheng, et al. Application of machine learning in wireless networks: Key techniques and open issues[J]. IEEE Communications Surveys & Tutorials, 2019, 21(4): 3072–3108. doi: 10.1109/COMST.2019.2924243. [8] DI VALERIO V, LO PRESTI F, PETRIOLI C, et al. CARMA: Channel-aware reinforcement learning-based multi-path adaptive routing for underwater wireless sensor networks[J]. IEEE Journal on Selected Areas in Communications, 2019, 37(11): 2634–2647. doi: 10.1109/JSAC.2019.2933968. [9] LIU Qingzhi, CHENG Long, JIA A L, et al. Deep reinforcement learning for communication flow control in wireless mesh networks[J]. IEEE Network, 2021, 35(2): 112–119. doi: 10.1109/MNET.011.2000303. [10] NG P C and SHE J. Remote proximity sensing with a novel Q-learning in Bluetooth low energy network[J]. IEEE Transactions on Wireless Communications, 2022, 21(8): 6156–6166. doi: 10.1109/TWC.2022.3147411. [11] WANG Jinxin, ZHANG Fan, XIE Zhonglin, et al. Joint bandwidth allocation and path selection in WANs with path cardinality constraints[J]. Journal of Communications and Information Networks, 2021, 6(3): 237–250. doi: 10.23919/JCIN.2021.9549120. [12] APPINI N R and REDDY A R. Joint channel assignment and bandwidth reservation using Improved FireFly Algorithm (IFA) in Wireless Mesh Networks (WMN)[J]. Wireless Personal Communications, 2023, 131(1): 455–470. doi: 10.1007/s11277-023-10439-8. [13] BINH L H and DUONG T V T. Load balancing routing under constraints of quality of transmission in mesh wireless network based on software defined networking[J]. Journal of Communications and Networks, 2021, 23(1): 12–22. doi: 10.23919/JCN.2021.000004. [14] KUMAR R, VENKANNA U, and TIWARI V. Opt-ACM: An optimized load balancing based admission control mechanism for software defined hybrid wireless based IoT (SDHW-IoT) network[J]. Computer Networks, 2021, 188: 107888. doi: 10.1016/j.comnet.2021.107888. [15] ALHARBI N, MACKENZIE L, and PEZAROS D. Enhancing graph routing algorithm of industrial wireless sensor networks using the covariance-matrix adaptation evolution strategy[J]. Sensors, 2022, 22(19): 7462. doi: 10.3390/s22197462. [16] BAROLLI A, BYLYKBASHI K, QAFZEZI E, et al. A comparison study of Weibull, normal and Boulevard distributions for wireless mesh networks considering different router replacement methods by a hybrid intelligent simulation system[J]. Journal of Ambient Intelligence and Humanized Computing, 2023, 14(8): 10181–10194. doi: 10.1007/s12652-021-03680-1. [17] ROZHOŇ V, HAEUPLER B, MARTINSSON A, et al. Parallel breadth-first search and exact shortest paths and stronger notions for approximate distances[C]. Proceedings of the 55th Annual ACM Symposium on Theory of Computing, Orlando, USA, 2023: 321–334. doi: 10.1145/3564246.3585235. [18] SILVA N, WERNECK H, SILVA T, et al. Multi-armed bandits in recommendation systems: A survey of the state-of-the-art and future directions[J]. Expert Systems with Applications, 2022, 197: 116669. doi: 10.1016/j.eswa.2022.116669. [19] LEE S, YU H, and LEE H. Multiagent Q-learning-based multi-UAV wireless networks for maximizing energy efficiency: Deployment and power control strategy design[J]. IEEE Internet of Things Journal, 2022, 9(9): 6434–6442. doi: 10.1109/JIOT.2021.3113128. [20] ZAATOURI I, ALYAOUI N, GUILOUFI A B, et al. Design and performance analysis of objective functions for RPL routing protocol[J]. Wireless Personal Communications, 2022, 124(3): 2677–2697. doi: 10.1007/s11277-022-09484-6.