面向物联网的深度Q网络无人机路径规划

张建行; 康凯; 钱骅; 杨淼

doi:10.11999/JEIT210962

面向物联网的深度Q网络无人机路径规划

doi: 10.11999/JEIT210962

张建行^{1, 2},
康凯^1, ,,
钱骅¹,
杨淼^{1, 3}

1.
中国科学院上海高等研究院上海 201210
2.
中国科学院大学北京 100049
3.
上海科技大学信息科学与技术学院上海 201210

基金项目: 国家重点研发计划(2020YFB2205603)，国家自然科学基金(61971286)，上海市科技创新行动计划(19DZ1204300)

详细信息

作者简介:
张建行：男，博士生，研究方向为无人机辅助物联网通信

康凯：男，正高级工程师，研究方向为无线通信物理层

钱骅：男，研究员，研究方向为无线通信、非线性信号处理、大数据信号处理

杨淼：男，博士生，研究方向为边缘智能与强化学习

通讯作者:
康凯　kangk@sari.ac.cn

中图分类号: TP92
计量
- 文章访问数: 1050
- HTML全文浏览量: 459
- PDF下载量: 257
- 被引次数: 0
出版历程
- 收稿日期: 2021-09-09
- 修回日期: 2021-11-05
- 网络出版日期: 2022-04-14
- 刊出日期: 2022-11-14

UAV Trajectory Planning Based on Deep Q-Networkfor Internet of Things

ZHANG Jianhang^{1, 2},
KANG Kai^{1
, ,},
QIAN Hua¹,
YANG Miao^{1, 3}

1.
Shanghai Advanced Research Institute, Chinese Academy of Sciences, Shanghai 201210, China
2.
University of Chinese Academy of Sciences, Beijing 100049, China
3.
School of Information Science and Technology, ShanghaiTech University, Shanghai 201210, China

Funds: The National Key Research and Development Program of China (2020YFB2205603), The National Natural Science Foundation of China (61971286), The Science and Technology Commission Foundation of Shanghai (19DZ1204300)

摘要

摘要: 随着无人机技术的广泛应用，基于无人机辅助数据收集的物联网架构扩展了物联网的应用范围，尤其适用于军事战场、灾害救援等极端场景。针对上述场景，该文提出一种基于深度Q网络(Deep Q-Network, DQN)框架的无人机飞行路径规划算法。该算法以无人机飞行周期内收集信息的平均信息年龄(Age of Information, AoI)为优化目标，来保证无人机收集数据的时效性。仿真结果表明，所提算法可以有效降低无人机单个飞行周期内收集数据的平均AoI。与随机算法、基于最大AoI的贪心算法、最短路径算法以及基于AoI的路径规划算法(AoI-based Trajectory Planning, ATP)相比，平均AoI分别降低了约81%, 67%, 56%和39%。该研究实现了无人机辅助物联网系统中，数据的高效、低时延采集。
- 无人机 /
- 物联网 /
- 信息年龄 /
- 路径规划 /
- 深度Q网络
Abstract: With the wide application of Unmanned Aerial Vehicle (UAV), the UAV-assisted Internet of Things (IoT) data collection architecture has expanded IoT’s application scope, which is especially suitable for extreme scenarios like military battlefields or disaster rescue. This paper proposes a UAV trajectory planning algorithm based on Deep Q-Network (DQN) framework for the above scenarios. The proposed algorithm takes the Age of Information (AoI) of collected data in a UAV’s flight cycle as the optimization goal to maintain data freshness. The simulation results show that this algorithm can effectively reduce the average AoI of the collected data. Compared with the random algorithm, the greedy algorithm based on the maximum AoI, the shortest path algorithm and the AoI-based Trajectory Planning (ATP) algorithm, the proposed algorithm can reduce AoI by about 81%, 67%, 56% and 39%, respectively. This paper has realized the efficient and low-latency data collection in the UAV-assisted IoT system.
- Unmanned Aerial Vehicle (UAV) /
- Internet of Things (IoT) /
- Age of Information (AoI) /
- Trajectory planning /
- Deep Q-Network (DQN)

HTML全文

图 1 DQN算法框图

下载: 全尺寸图片幻灯片

图 2 不同算法在两种仿真场景下的性能对比

下载: 全尺寸图片幻灯片

图 3 不同地面节点的平均AoI对比

下载: 全尺寸图片幻灯片

图 4 节点个数对DQN算法性能的影响

下载: 全尺寸图片幻灯片

图 5 无人机飞行速度对DQN算法性能的影响

下载: 全尺寸图片幻灯片

表 1 基于DQN的无人机路径规划算法

输入：学习速率 $\alpha$ ；打折率 $\gamma$ ；随机选取动作的参数 $\varepsilon$ 和 $\mu$ ；步长　　　　 $w$ ；
(1) 初始化网络 ${Q_r}$ 和 ${Q_t}$ 的参数，并令 ${\theta _r}{\text{ = }}{\theta _t}$
(2) for 每一个训练回合 do
(3) 　　初始化状态 ${s_k}$
(4) 　　while ${T_k} < {T_{\max }}$ do
(5) 　　　　以 $\varepsilon$ 的概率随机选择动作 ${a_k}$ ，否则选择　　　　　　 ${a_k} = \arg {\max _a}{Q_r}(s,a;{\theta _r})$
(6) 　　　　执行动作 ${a_k}$ ，按照式(2)和式(5)更新状态，计算奖　　　　　　励，得到 $\left( {{s_k},{a_k},{r_k},{s_{k + 1}}} \right)$ 并存储在经验池中
(7) 　　　　if 经验池已存储满 do
(8) 　　　　　　随机抽取 ${N_b}$ 个样本按照式(11)训练
(9) 　　　　End if
(10) 　　　　if $k\bmod w = 0$ do
(11) 　　　　　　 ${\theta _r}{\text{ = }}{\theta _t}$
(12) 　　　　End if
(13) 　　　　 ${s_k} \leftarrow {s_{k + 1}}$
(14) 　　　　 $\varepsilon \leftarrow \varepsilon - \mu$
(15) 　end while
(16) end for

下载: 导出CSV

表 2 DQN算法参数

参数名称	学习速率	折扣因子	随机概率	衰减因子	超参数	经验池	训练批次	更新步长
参数符号	$\alpha$	$\gamma$	$\varepsilon$	$\mu$	$\lambda$	${\text{\|}}D{\text{\|}}$	${\text{\|}}B{\text{\|}}$	$w$
数值	0.001	0.9	0.95	0.000 1	10	3000	128	100

下载: 导出CSV

表 3 不同算法的AoI性能对比(s)

算法名称	DQN	ATP	最短路径	贪心法	随机算法
仿真场景1	7.7	12.7	17.8	23.3	42.1
仿真场景2	8.8	14.3	19.3	27.6	45.1

下载: 导出CSV

参考文献(24)

[1]	LI Shancang, XU Lida, and ZHAO Shanshan. The internet of things: A survey[J]. Information Systems Frontiers, 2015, 17(2): 243–259. doi: 10.1007/s10796-014-9492-7
[2]	ZENG Yong and ZHANG Rui. Energy-efficient UAV communication with trajectory optimization[J]. IEEE Transactions on Wireless Communications, 2017, 16(6): 3747–3760. doi: 10.1109/TWC.2017.2688328
[3]	宋庆恒, 郑福春. 基于无人机的物联网无线通信的潜力与方法[J]. 物联网学报, 2019, 3(1): 82–89. doi: 10.11959/j.issn.2096-3750.2019.00096 SONG Qingheng and ZHENG Fuchun. Potential and methods of wireless communications for Internet of things based on UAV[J]. Chinese Journal on Internet of Things, 2019, 3(1): 82–89. doi: 10.11959/j.issn.2096-3750.2019.00096
[4]	ZENG Yong, ZHANG Rui, and LIM T J. Wireless communications with unmanned aerial vehicles: Opportunities and challenges[J]. IEEE Communications Magazine, 2016, 54(5): 36–42. doi: 10.1109/MCOM.2016.7470933
[5]	MOZAFFARI M, SAAD W, BENNIS M, et al. A tutorial on UAVs for wireless networks: Applications, challenges, and open problems[J]. IEEE Communications Surveys & Tutorials, 2019, 21(3): 2334–2360. doi: 10.1109/COMST.2019.2902862
[6]	东方, 吴媚, 朱文捷, 等. 物联网环境下面向能耗优化的无人机飞行规划[J]. 东南大学学报:自然科学版, 2020, 50(3): 555–562. doi: 10.3969/j.issn.1001-0505.2020.03.019 DONG Fang, WU Mei, ZHU Wenjie, et al. Energy-efficient flight planning for UAV in IoT environment[J]. Journal of Southeast University:Natural Science Edition, 2020, 50(3): 555–562. doi: 10.3969/j.issn.1001-0505.2020.03.019
[7]	ZENG Yong, ZHANG Rui, and LIM T J. Throughput maximization for UAV-enabled mobile relaying systems[J]. IEEE Transactions on Communications, 2016, 64(12): 4983–4996. doi: 10.1109/TCOMM.2016.2611512
[8]	GONG Jie, CHANG T H, SHEN Chao, et al. Flight time minimization of UAV for data collection over wireless sensor networks[J]. IEEE Journal on Selected Areas in Communications, 2018, 36(9): 1942–1954. doi: 10.1109/JSAC.2018.2864420
[9]	MONWAR M, SEMIARI O, and SAAD W. Optimized path planning for inspection by unmanned aerial vehicles swarm with energy constraints[C]. Proceedings of 2018 IEEE Global Communications Conference, Abu Dhabi, United Arab Emirates, 2018: 1–6.
[10]	WU Qingqing, ZENG Yong, and ZHANG Rui. Joint trajectory and communication design for multi-UAV enabled wireless networks[J]. IEEE Transactions on Wireless Communications, 2018, 17(3): 2109–2121. doi: 10.1109/TWC.2017.2789293
[11]	付澍, 杨祥月, 张海君, 等. 物联网数据收集中无人机路径智能规划[J]. 通信学报, 2021, 42(2): 124–133. doi: 10.11959/j.issn.1000-436x.2021036 FU Shu, YANG Xiangyue, ZHANG Haijun, et al. UAV path intelligent planning in iot data collection[J]. Journal on Communications, 2021, 42(2): 124–133. doi: 10.11959/j.issn.1000-436x.2021036
[12]	DONG Yunquan, CHEN Zhengchuan, LIU Shanyun, et al. Age-upon-decisions minimizing scheduling in internet of things: To Be random or to Be deterministic?[J]. IEEE Internet of Things Journal, 2020, 7(2): 1081–1097. doi: 10.1109/JIOT.2019.2950054
[13]	KOSTA A, PAPPAS N, and ANGELAKIS V. Age of Information: A new concept, metric, and tool[J]. Foundation and Trends in Networking, 2017, 12(3): 162–259. doi: 10.1561.1300000060
[14]	ABD-ELMAGID M A, PAPPAS N, and DHILLON H S. On the role of age of information in the internet of things[J]. IEEE Communications Magazine, 2019, 57(12): 72–77. doi: 10.1109/MCOM.001.1900041
[15]	DE BERG M, GUDMUNDSSON J, KATZ M J, et al. TSP with neighborhoods of varying size[J]. Journal of Algorithms, 2005, 57(1): 22–36. doi: 10.1016/j.jalgor.2005.01.010
[16]	WANG Chengliang, MA Fei, YAN Junhui, et al. Efficient aerial data collection with UAV in large-scale wireless sensor networks[J/OL]. International Journal of Distributed Sensor Networks, 2015, 11(11).
[17]	ALI Z A, MASROOR S, and AAMIR M. UAV based data gathering in wireless sensor networks[J]. Wireless Personal Communications, 2019, 106(4): 1801–1811. doi: 10.1007/s11277-018-5693-6
[18]	CHENG C F and YU Chaofu. Data gathering in wireless sensor networks: A combine-TSP-reduce approach[J]. IEEE Transactions on Vehicular Technology, 2016, 65(4): 2309–2324. doi: 10.1109/TVT.2015.2502625
[19]	BANDEIRA T W, COUTINHO W P, BRITO A V, et al. Analysis of path planning algorithms based on travelling salesman problem embedded in UAVs[C]. Proceedings of 2015 Brazilian Symposium on Computing Systems Engineering (SBESC), Foz do Iguacu, Brazil, 2015: 70–75.
[20]	KAUL S, YATES R, and GRUTESER M. Real-time status: How often should one update?[C]. Proceedings of 2012 IEEE INFOCOM, Orlando, USA, 2012: 2731–2735.
[21]	ZHOU Conghao, HE Hongli, YANG Peng, et al. Deep RL-based trajectory planning for AoI minimization in UAV-assisted IoT[C]. Proceedings of the 11th International Conference on Wireless Communications and Signal Processing, Xi'an, China, 2019: 1–6.
[22]	MODARES J, GHANEI F, MASTRONARDE N, et al. UB-ANC planner: Energy efficient coverage path planning with multiple drones[C]. Proceedings of 2017 IEEE International Conference on Robotics and Automation, Singapore, 2017: 6182–6189.
[23]	MNIH V, KAVUKCUOGLU K, SILVER D, et al. Human-level control through deep reinforcement learning[J]. Nature, 2015, 518(7540): 529–533. doi: 10.1038/nature14236
[24]	SOMASUNDARA A A, RAMAMOORTHY A, and SRIVASTAVA M B. Mobile element scheduling with dynamic deadlines[J]. IEEE Transactions on Mobile Computing, 2007, 6(4): 395–410. doi: 10.1109/TMC.2007.57