Joint Routing and Resource Scheduling Algorithm for Large-scale Multi-mode Mesh Networks Based on Reinforcement Learning
-
摘要: 为了平衡新型电力系统中大规模多模Mesh网络的传输可靠性和效率,该文在对优化问题进行描述和分析的基础上提出一种基于强化学习的大规模多模Mesh网络联合路由选择及资源调度算法,分为两个阶段。在第1阶段中,根据网络拓扑结构信息和业务需求,利用一种多条最短路径路由算法,输出所有最短路径。在第2阶段中,提出一种基于多臂老虎机(MAB)的资源调度算法,该算法基于得到的最短路径集合构建MAB的摇臂,然后根据业务需求计算回报,最终给出最优的路由选择及资源调度方式用于业务传输。仿真结果表明,所提算法能够满足不同的业务传输需求,实现端到端路径的平均时延和平均传输成功率的高效平衡。Abstract: In order to balance the transmission reliability and efficiency of large-scale multi-mode mesh networks in the new power system, a two-stage algorithm is proposed based on reinforcement learning for joint routing selection and resource scheduling in large-scale multi-mode mesh networks, building upon the description and analysis of optimization problems. In the first stage, based on the network topology information and service requirements, a multi shortest path routing algorithm is utilized to generate all the shortest paths. In the second stage, a resource scheduling algorithm based on Multi-Armed Bandit (MAB) is proposed. The algorithm constructs the arms of the MAB based on the obtained set of shortest paths, then calculates the reward according to the service demands, and finally gives the optimal route selection and resource scheduling mode for service transmission. Simulation results show that the proposed algorithm can meet different service transmission requirements, and achieve an efficient balance between the average end-to-end path delay and the average transmission success rate.
-
1 改进的宽度优先搜索算法
输入:网络拓扑邻接矩阵A,源节点nes (1) 将nes放入队列Queue中;将nes的Visited值设置为True,而其余节点的该值设置为False;将nes到源节点的最短跳数HopCount(nes)值
设置为0,而其余节点的该值设置为无穷大;将所有节点的Searched值设置为False;将每个节点n的前置节点数量FrontCount(n)值设
置为0,表示所有网络节点的前置节点集合FrontPoint(n)中前置节点数量为0(2) WHILE Queue不是空集 DO (3) 获取Queue的队首节点Head,遍历A,找出Head的所有邻居节点Neighborz,z=1,2,⋯,Z (4) FOR Neighborz,z=1,2,⋯,Z DO (5) IF Neighborz的Searched值为True THEN 跳出本次循环 (6) ELSE IF Neighborz的Visited值为False THEN (7) 令HopCount(Neighborz)=HopCount(Head)+1;将FrontCount(Neighborz)值加1 把Head存入FrontPoint(Neighborz)中;把Neighborz存入Queue中,并将其Visited值设置为True (8) ELSE Neighborz的Visited值为True THEN (9) IF HopCount(Head)+1<HopCount(Neighborz) THEN (10) 令HopCount(Neighborz)=HopCount(Head)+1;FrontCount(Neighborz)值保持不变 把FrontPoint(Neighborz)中最近一个存入的元素替换为Head (11) ELSE IF HopCount(Head)+1=HopCount(Neighborz) THEN (12) 将FrontCount(Neighborz)值加1;把Head存入FrontPoint(Neighborz)中 (13) ELSE 跳出本次循环 (14) END IF (15) END IF (16) END FOR (17) 将当前Head的Searched值设置为True,并将其移出Queue (18) END WHILE (19) 按如下方式遍历设置As(asij)中每个元素asij的值:asij={1,j∈FrontPoint(i)0,其他,i,j∈N 输出:最短路径邻接矩阵As 2 多条最短路径输出算法
输入:最短路径邻接矩阵As,源节点nes和目的节点ned (1) 将nes压入主栈Stackmain;遍历As,将nes的邻居节点存入邻居节点列表Array,然后将Array作为栈顶压入辅栈Stackassist (2) WHILE Stackmain不是空集 DO (3) 获取Stackassist栈顶,作为新的Array (4) IF Array非空 THEN (5) 获取Array中的首个元素,将其压入Stackmain,并将剩余元素构成的列表重新压入Stackassist (6) 查询栈顶元素的Array,将Stackmain中包含的元素从其中剔除,再将其压入Stackassist (7) ELSE 将Stackmain和Stackassist的栈顶元素弹出 (8) END IF (9) IF Stackmain的栈顶元素与ned相等 THEN 将最短路径peu=Stackmain存入Pathe;将Stackmain和Stackassist的栈顶元素弹出 (10) END IF (11) END WHILE 输出:最短路径集合Pathe={pe1,pe2,⋯,peU} 3 基于MAB的资源调度选择算法
输入:摇臂集合Arme,最大迭代次数Tmax (1) 令Q(armek)=0,count(armek)=0 (2) FOR t=1,2,⋯,Tmax DO (3) 根据式(16)更新ϵ(t),然后按如下方式选择摇臂armek:armek={从arme1,arme2,⋯,armeK中以均匀分布随机选取,以ε(t)的概率argmaxarmekQt(armek),以1−ε(t)的概率 (4) 令count(armek)=count(armek)+1 (5) 根据式(14)计算Rt(armek),然后根据式(15)更新
Qt(armek)(6) END FOR
输出:最佳摇臂armebest=argmaxarmekQ(armek)表 1 相关仿真参数设置
参数 最大迭代次数
Tmax探索概率ε(t)
初始值εinit衰减系数χ 业务包
数据量De传码率RB 路由汇聚节点
排队时延Te,waitj时延重要程度α,
成功率重要程度β数值 2 × K 0.99 6 600 bit 115 200 1~3 ms的随机值 0.5 -
[1] BEDI G, VENAYAGAMOORTHY G K, SINGH R, et al. Review of Internet of Things (IoT) in electric power and energy systems[J]. IEEE Internet of Things Journal, 2018, 5(2): 847–870. doi: 10.1109/JIOT.2018.2802704. [2] TALEB S M, MERAIHI Y, GABIS A B, et al. Nodes placement in wireless mesh networks using optimization approaches: A survey[J]. Neural Computing and Applications, 2022, 34(7): 5283–5319. doi: 10.1007/s00521–022-06941-y. [3] ALOTAIBI E and MUKHERJEE B. A survey on routing algorithms for wireless ad-hoc and mesh networks[J]. Computer Networks, 2012, 56(2): 940–965. doi: 10.1016/j.comnet.2011.10.011. [4] WANG Lei, ZHANG Lianfang, SHU Yantai, et al. Multipath source routing in wireless ad hoc networks[C]. 2000 Canadian Conference on Electrical and Computer Engineering. Conference Proceedings. Navigating to a New Era (Cat. No. 00TH8492), Halifax, Canada, 2000: 479–483. doi: 10.1109/CCECE.2000.849755. [5] GUO Xiaoyuan, WANG Feng, LIU Jiangchuan, et al. Path diversified multi-QoS optimization in multi-channel wireless mesh networks[J]. Wireless Networks, 2014, 20(6): 1583–1596. doi: 10.1007/s11276-014-0698-x. [6] JIA Dongyao, ZOU Shengxiong, LI Meng, et al. Adaptive multi-path routing based on an improved leapfrog algorithm[J]. Information Sciences, 2016, 367/368: 615–629. doi: 10.1016/j.ins.2016.07.021. [7] SUN Yaohua, PENG Mugen, ZHOU Yangcheng, et al. Application of machine learning in wireless networks: Key techniques and open issues[J]. IEEE Communications Surveys & Tutorials, 2019, 21(4): 3072–3108. doi: 10.1109/COMST.2019.2924243. [8] DI VALERIO V, LO PRESTI F, PETRIOLI C, et al. CARMA: Channel-aware reinforcement learning-based multi-path adaptive routing for underwater wireless sensor networks[J]. IEEE Journal on Selected Areas in Communications, 2019, 37(11): 2634–2647. doi: 10.1109/JSAC.2019.2933968. [9] LIU Qingzhi, CHENG Long, JIA A L, et al. Deep reinforcement learning for communication flow control in wireless mesh networks[J]. IEEE Network, 2021, 35(2): 112–119. doi: 10.1109/MNET.011.2000303. [10] NG P C and SHE J. Remote proximity sensing with a novel Q-learning in Bluetooth low energy network[J]. IEEE Transactions on Wireless Communications, 2022, 21(8): 6156–6166. doi: 10.1109/TWC.2022.3147411. [11] WANG Jinxin, ZHANG Fan, XIE Zhonglin, et al. Joint bandwidth allocation and path selection in WANs with path cardinality constraints[J]. Journal of Communications and Information Networks, 2021, 6(3): 237–250. doi: 10.23919/JCIN.2021.9549120. [12] APPINI N R and REDDY A R. Joint channel assignment and bandwidth reservation using Improved FireFly Algorithm (IFA) in Wireless Mesh Networks (WMN)[J]. Wireless Personal Communications, 2023, 131(1): 455–470. doi: 10.1007/s11277-023-10439-8. [13] BINH L H and DUONG T V T. Load balancing routing under constraints of quality of transmission in mesh wireless network based on software defined networking[J]. Journal of Communications and Networks, 2021, 23(1): 12–22. doi: 10.23919/JCN.2021.000004. [14] KUMAR R, VENKANNA U, and TIWARI V. Opt-ACM: An optimized load balancing based admission control mechanism for software defined hybrid wireless based IoT (SDHW-IoT) network[J]. Computer Networks, 2021, 188: 107888. doi: 10.1016/j.comnet.2021.107888. [15] ALHARBI N, MACKENZIE L, and PEZAROS D. Enhancing graph routing algorithm of industrial wireless sensor networks using the covariance-matrix adaptation evolution strategy[J]. Sensors, 2022, 22(19): 7462. doi: 10.3390/s22197462. [16] BAROLLI A, BYLYKBASHI K, QAFZEZI E, et al. A comparison study of Weibull, normal and Boulevard distributions for wireless mesh networks considering different router replacement methods by a hybrid intelligent simulation system[J]. Journal of Ambient Intelligence and Humanized Computing, 2023, 14(8): 10181–10194. doi: 10.1007/s12652-021-03680-1. [17] ROZHOŇ V, HAEUPLER B, MARTINSSON A, et al. Parallel breadth-first search and exact shortest paths and stronger notions for approximate distances[C]. Proceedings of the 55th Annual ACM Symposium on Theory of Computing, Orlando, USA, 2023: 321–334. doi: 10.1145/3564246.3585235. [18] SILVA N, WERNECK H, SILVA T, et al. Multi-armed bandits in recommendation systems: A survey of the state-of-the-art and future directions[J]. Expert Systems with Applications, 2022, 197: 116669. doi: 10.1016/j.eswa.2022.116669. [19] LEE S, YU H, and LEE H. Multiagent Q-learning-based multi-UAV wireless networks for maximizing energy efficiency: Deployment and power control strategy design[J]. IEEE Internet of Things Journal, 2022, 9(9): 6434–6442. doi: 10.1109/JIOT.2021.3113128. [20] ZAATOURI I, ALYAOUI N, GUILOUFI A B, et al. Design and performance analysis of objective functions for RPL routing protocol[J]. Wireless Personal Communications, 2022, 124(3): 2677–2697. doi: 10.1007/s11277-022-09484-6. -