UAV Trajectory Planning Based on Deep Q-Networkfor Internet of Things
-
摘要: 随着无人机技术的广泛应用,基于无人机辅助数据收集的物联网架构扩展了物联网的应用范围,尤其适用于军事战场、灾害救援等极端场景。针对上述场景,该文提出一种基于深度Q网络(Deep Q-Network, DQN)框架的无人机飞行路径规划算法。该算法以无人机飞行周期内收集信息的平均信息年龄(Age of Information, AoI)为优化目标,来保证无人机收集数据的时效性。仿真结果表明,所提算法可以有效降低无人机单个飞行周期内收集数据的平均AoI。与随机算法、基于最大AoI的贪心算法、最短路径算法以及基于AoI的路径规划算法(AoI-based Trajectory Planning, ATP)相比,平均AoI分别降低了约81%, 67%, 56%和39%。该研究实现了无人机辅助物联网系统中,数据的高效、低时延采集。Abstract: With the wide application of Unmanned Aerial Vehicle (UAV), the UAV-assisted Internet of Things (IoT) data collection architecture has expanded IoT’s application scope, which is especially suitable for extreme scenarios like military battlefields or disaster rescue. This paper proposes a UAV trajectory planning algorithm based on Deep Q-Network (DQN) framework for the above scenarios. The proposed algorithm takes the Age of Information (AoI) of collected data in a UAV’s flight cycle as the optimization goal to maintain data freshness. The simulation results show that this algorithm can effectively reduce the average AoI of the collected data. Compared with the random algorithm, the greedy algorithm based on the maximum AoI, the shortest path algorithm and the AoI-based Trajectory Planning (ATP) algorithm, the proposed algorithm can reduce AoI by about 81%, 67%, 56% and 39%, respectively. This paper has realized the efficient and low-latency data collection in the UAV-assisted IoT system.
-
表 1 基于DQN的无人机路径规划算法
输入:学习速率$ \alpha $;打折率$ \gamma $;随机选取动作的参数$ \varepsilon $和$ \mu $;步长
$ w $;(1) 初始化 网络$ {Q_r} $和$ {Q_t} $的参数,并令$ {\theta _r}{\text{ = }}{\theta _t} $ (2) for 每一个训练回合 do (3) 初始化状态$ {s_k} $ (4) while $ {T_k} < {T_{\max }} $ do (5) 以$ \varepsilon $的概率随机选择动作$ {a_k} $,否则选择
$ {a_k} = \arg {\max _a}{Q_r}(s,a;{\theta _r}) $(6) 执行动作$ {a_k} $,按照式(2)和式(5)更新状态,计算奖
励,得到$ \left( {{s_k},{a_k},{r_k},{s_{k + 1}}} \right) $并存储在经验池中(7) if 经验池已存储满 do (8) 随机抽取$ {N_b} $个样本按照式(11)训练 (9) End if (10) if $ k\bmod w = 0 $ do (11) $ {\theta _r}{\text{ = }}{\theta _t} $ (12) End if (13) $ {s_k} \leftarrow {s_{k + 1}} $ (14) $ \varepsilon \leftarrow \varepsilon - \mu $ (15) end while (16) end for 表 2 DQN算法参数
参数名称 学习速率 折扣因子 随机概率 衰减因子 超参数 经验池 训练批次 更新步长 参数符号 $ \alpha $ $ \gamma $ $ \varepsilon $ $ \mu $ $ \lambda $ $ {\text{|}}D{\text{|}} $ $ {\text{|}}B{\text{|}} $ $ w $ 数值 0.001 0.9 0.95 0.000 1 10 3000 128 100 表 3 不同算法的AoI性能对比(s)
算法名称 DQN ATP 最短路径 贪心法 随机算法 仿真场景1 7.7 12.7 17.8 23.3 42.1 仿真场景2 8.8 14.3 19.3 27.6 45.1 -
[1] LI Shancang, XU Lida, and ZHAO Shanshan. The internet of things: A survey[J]. Information Systems Frontiers, 2015, 17(2): 243–259. doi: 10.1007/s10796-014-9492-7 [2] ZENG Yong and ZHANG Rui. Energy-efficient UAV communication with trajectory optimization[J]. IEEE Transactions on Wireless Communications, 2017, 16(6): 3747–3760. doi: 10.1109/TWC.2017.2688328 [3] 宋庆恒, 郑福春. 基于无人机的物联网无线通信的潜力与方法[J]. 物联网学报, 2019, 3(1): 82–89. doi: 10.11959/j.issn.2096-3750.2019.00096SONG Qingheng and ZHENG Fuchun. Potential and methods of wireless communications for Internet of things based on UAV[J]. Chinese Journal on Internet of Things, 2019, 3(1): 82–89. doi: 10.11959/j.issn.2096-3750.2019.00096 [4] ZENG Yong, ZHANG Rui, and LIM T J. Wireless communications with unmanned aerial vehicles: Opportunities and challenges[J]. IEEE Communications Magazine, 2016, 54(5): 36–42. doi: 10.1109/MCOM.2016.7470933 [5] MOZAFFARI M, SAAD W, BENNIS M, et al. A tutorial on UAVs for wireless networks: Applications, challenges, and open problems[J]. IEEE Communications Surveys & Tutorials, 2019, 21(3): 2334–2360. doi: 10.1109/COMST.2019.2902862 [6] 东方, 吴媚, 朱文捷, 等. 物联网环境下面向能耗优化的无人机飞行规划[J]. 东南大学学报:自然科学版, 2020, 50(3): 555–562. doi: 10.3969/j.issn.1001-0505.2020.03.019DONG Fang, WU Mei, ZHU Wenjie, et al. Energy-efficient flight planning for UAV in IoT environment[J]. Journal of Southeast University:Natural Science Edition, 2020, 50(3): 555–562. doi: 10.3969/j.issn.1001-0505.2020.03.019 [7] ZENG Yong, ZHANG Rui, and LIM T J. Throughput maximization for UAV-enabled mobile relaying systems[J]. IEEE Transactions on Communications, 2016, 64(12): 4983–4996. doi: 10.1109/TCOMM.2016.2611512 [8] GONG Jie, CHANG T H, SHEN Chao, et al. Flight time minimization of UAV for data collection over wireless sensor networks[J]. IEEE Journal on Selected Areas in Communications, 2018, 36(9): 1942–1954. doi: 10.1109/JSAC.2018.2864420 [9] MONWAR M, SEMIARI O, and SAAD W. Optimized path planning for inspection by unmanned aerial vehicles swarm with energy constraints[C]. Proceedings of 2018 IEEE Global Communications Conference, Abu Dhabi, United Arab Emirates, 2018: 1–6. [10] WU Qingqing, ZENG Yong, and ZHANG Rui. Joint trajectory and communication design for multi-UAV enabled wireless networks[J]. IEEE Transactions on Wireless Communications, 2018, 17(3): 2109–2121. doi: 10.1109/TWC.2017.2789293 [11] 付澍, 杨祥月, 张海君, 等. 物联网数据收集中无人机路径智能规划[J]. 通信学报, 2021, 42(2): 124–133. doi: 10.11959/j.issn.1000-436x.2021036FU Shu, YANG Xiangyue, ZHANG Haijun, et al. UAV path intelligent planning in iot data collection[J]. Journal on Communications, 2021, 42(2): 124–133. doi: 10.11959/j.issn.1000-436x.2021036 [12] DONG Yunquan, CHEN Zhengchuan, LIU Shanyun, et al. Age-upon-decisions minimizing scheduling in internet of things: To Be random or to Be deterministic?[J]. IEEE Internet of Things Journal, 2020, 7(2): 1081–1097. doi: 10.1109/JIOT.2019.2950054 [13] KOSTA A, PAPPAS N, and ANGELAKIS V. Age of Information: A new concept, metric, and tool[J]. Foundation and Trends in Networking, 2017, 12(3): 162–259. doi: 10.1561.1300000060 [14] ABD-ELMAGID M A, PAPPAS N, and DHILLON H S. On the role of age of information in the internet of things[J]. IEEE Communications Magazine, 2019, 57(12): 72–77. doi: 10.1109/MCOM.001.1900041 [15] DE BERG M, GUDMUNDSSON J, KATZ M J, et al. TSP with neighborhoods of varying size[J]. Journal of Algorithms, 2005, 57(1): 22–36. doi: 10.1016/j.jalgor.2005.01.010 [16] WANG Chengliang, MA Fei, YAN Junhui, et al. Efficient aerial data collection with UAV in large-scale wireless sensor networks[J/OL]. International Journal of Distributed Sensor Networks, 2015, 11(11). [17] ALI Z A, MASROOR S, and AAMIR M. UAV based data gathering in wireless sensor networks[J]. Wireless Personal Communications, 2019, 106(4): 1801–1811. doi: 10.1007/s11277-018-5693-6 [18] CHENG C F and YU Chaofu. Data gathering in wireless sensor networks: A combine-TSP-reduce approach[J]. IEEE Transactions on Vehicular Technology, 2016, 65(4): 2309–2324. doi: 10.1109/TVT.2015.2502625 [19] BANDEIRA T W, COUTINHO W P, BRITO A V, et al. Analysis of path planning algorithms based on travelling salesman problem embedded in UAVs[C]. Proceedings of 2015 Brazilian Symposium on Computing Systems Engineering (SBESC), Foz do Iguacu, Brazil, 2015: 70–75. [20] KAUL S, YATES R, and GRUTESER M. Real-time status: How often should one update?[C]. Proceedings of 2012 IEEE INFOCOM, Orlando, USA, 2012: 2731–2735. [21] ZHOU Conghao, HE Hongli, YANG Peng, et al. Deep RL-based trajectory planning for AoI minimization in UAV-assisted IoT[C]. Proceedings of the 11th International Conference on Wireless Communications and Signal Processing, Xi'an, China, 2019: 1–6. [22] MODARES J, GHANEI F, MASTRONARDE N, et al. UB-ANC planner: Energy efficient coverage path planning with multiple drones[C]. Proceedings of 2017 IEEE International Conference on Robotics and Automation, Singapore, 2017: 6182–6189. [23] MNIH V, KAVUKCUOGLU K, SILVER D, et al. Human-level control through deep reinforcement learning[J]. Nature, 2015, 518(7540): 529–533. doi: 10.1038/nature14236 [24] SOMASUNDARA A A, RAMAMOORTHY A, and SRIVASTAVA M B. Mobile element scheduling with dynamic deadlines[J]. IEEE Transactions on Mobile Computing, 2007, 6(4): 395–410. doi: 10.1109/TMC.2007.57