高级搜索

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

基于拍卖多智能体深度确定性策略梯度的多无人车分散策略研究

郭宏达 娄静涛 杨珍珍 徐友春

郭宏达, 娄静涛, 杨珍珍, 徐友春. 基于拍卖多智能体深度确定性策略梯度的多无人车分散策略研究[J]. 电子与信息学报, 2024, 46(1): 287-298. doi: 10.11999/JEIT221582
引用本文: 郭宏达, 娄静涛, 杨珍珍, 徐友春. 基于拍卖多智能体深度确定性策略梯度的多无人车分散策略研究[J]. 电子与信息学报, 2024, 46(1): 287-298. doi: 10.11999/JEIT221582
GUO Hongda, LOU Jingtao, YANG Zhenzhen, XU Youchun. Research on Dispersion Strategy for Multiple Unmanned Ground Vehicles Based on Auction Multi-agent Deep Deterministic Policy Gradient[J]. Journal of Electronics & Information Technology, 2024, 46(1): 287-298. doi: 10.11999/JEIT221582
Citation: GUO Hongda, LOU Jingtao, YANG Zhenzhen, XU Youchun. Research on Dispersion Strategy for Multiple Unmanned Ground Vehicles Based on Auction Multi-agent Deep Deterministic Policy Gradient[J]. Journal of Electronics & Information Technology, 2024, 46(1): 287-298. doi: 10.11999/JEIT221582

基于拍卖多智能体深度确定性策略梯度的多无人车分散策略研究

doi: 10.11999/JEIT221582
详细信息
    作者简介:

    郭宏达:男,博士生,研究方向为多无人车路径规划、车间通信等

    娄静涛:男,博士,工程师,研究方向为机器视觉、无人驾驶军民融合等

    杨珍珍:女,硕士,助教,研究方向为外军军事运输、美军航空运输等

    徐友春:男,博士生导师,教授,研究方向为无人系统架构、机器学习等

    通讯作者:

    娄静涛 loujt_1984@126.com

  • 中图分类号: TN911.7; T249

Research on Dispersion Strategy for Multiple Unmanned Ground Vehicles Based on Auction Multi-agent Deep Deterministic Policy Gradient

  • 摘要: 多无人车(multi-UGV)分散在军事作战任务中应用非常广泛,现有方法较为复杂,规划时间较长,且适用性不强。针对此问题,该文提出一种基于拍卖多智能体深度确定性策略梯度(AU-MADDPG)算法的多无人车分散策略。在单无人车模型的基础上,建立基于深度强化学习的多无人车分散模型。对MADDPG结构进行优化,采用拍卖算法计算总路径最短时各无人车所对应的分散点,降低分散点分配的随机性,结合MADDPG算法规划路径,提高训练效率及运行效率;优化奖励函数,考虑训练过程中及结束两个阶段,全面考虑约束,将多约束问题转化为奖励函数设计问题,实现奖励函数最大化。仿真结果表明:与传统MADDPG算法相比,所提算法在训练时间上缩短了3.96%,路径总长度减少14.50%,解决分散问题时更为有效,可作为此类问题的通用解决方案。
  • 图  1  多无人车分散场景示意图

    图  2  无人车运动学模型

    图  3  多无人车分散训练测试框架

    图  4  多无人车分散算法流程图

    图  5  训练环境

    图  6  分散环境示意图

    7  不同算法下多无人车分散轨迹

    图  8  算法平均奖励

    图  9  训练过程中耗时

    图  10  测试过程中耗时

    图  11  基于遗传算法的多无人车分散路径

    表  1  AU-MADDPG算法参数设置

    参数
    经验池大小M1000
    actor学习率la0.01
    critic学习率lc0.01
    最小批学习数N32
    迭代总次数E105
    每次迭代的最大步数T25
    网络更新率τ0.01
    运行采样时间δt (s)0.1
    下载: 导出CSV

    表  2  测试100次路径长度对比

    MADQNMADDPGAU-MADDPG
    无障碍环境总长度545.272240.915205.959
    最长路径13.2845.4825.618
    最短路径1.8830.1380.220
    越野
    环境
    总长度602.498285.717258.436
    最长路径13.8755.8365.661
    最短路径2.1320.2990.282
    城市
    环境
    总长度692.331346.120288.96
    最长路径15.0627.0536.915
    最短路径2.1250.5260.469
    下载: 导出CSV

    表  3  算法耗时对比

    MADQNMADDPGAU-MADDPG
    无障碍
    环境
    训练迭代40000次(s)14548.7098417.0648083.574
    测试100次(s)45.6266.4416.324
    越野
    环境
    训练迭代40000次(s)15129.6618852.9188366.883
    测试100次(s)50.7707.3976.935
    城市
    环境
    训练迭代40000次(s)14998.7588997.2018401.293
    测试100次(s)51.2157.4597.061
    下载: 导出CSV

    表  4  MADDPG单方面优化性能

    优化奖励函数引入拍卖算法
    训练迭代40000次耗时(s)8151.7018305.227
    平均奖励–310.411–382.506
    测试100次耗时(s)6.3536.409
    总长度235.881210.680
    下载: 导出CSV

    表  5  遗传算法测试结果

    迭代次数耗时(s)总路径最短路径最长路径
    10064.323395.6700.35210.557
    下载: 导出CSV
  • [1] 解少博, 屈鹏程, 李嘉诚, 等. 跟驰场景中网联混合电动货车速度规划和能量管理协同控制的研究[J]. 汽车工程, 2022, 44(8): 1136–1143,1152. doi: 10.19562/j.chinasae.qcgc.2022.08.003

    XIE Shaobo, QU Pengcheng, LI Jiacheng, et al. Study on coordinated control of speed planning and energy management for connected hybrid electric truck in vehicle following scene[J]. Automotive Engineering, 2022, 44(8): 1136–1143,1152. doi: 10.19562/j.chinasae.qcgc.2022.08.003
    [2] 张立雄, 郭艳, 李宁, 等. 基于多智能体强化学习的无人车分布式路径规划方法[J]. 电声技术, 2021, 45(3): 52–57. doi: 10.16311/j.audioe.2021.03.010

    ZHANG Lixiong, GUO Yan, LI Ning, et al. Path planning method of autonomous vehicles based on multi agent reinforcement learning[J]. Audio Engineering, 2021, 45(3): 52–57. doi: 10.16311/j.audioe.2021.03.010
    [3] 孟磊, 吴芝亮, 王轶强. POMDP模型在多机器人环境探测中的应用研究[J]. 机械科学与技术, 2022, 41(2): 178–185. doi: 10.13433/j.cnki.1003-8728.20200318

    MENG Lei, WU Zhiliang, and WANG Yiqiang. Research on multi-robot environment exploration using POMDP[J]. Mechanical Science and Technology for Aerospace Engineering, 2022, 41(2): 178–185. doi: 10.13433/j.cnki.1003-8728.20200318
    [4] 李瑞珍, 杨惠珍, 萧丛杉. 基于动态围捕点的多机器人协同策略[J]. 控制工程, 2019, 26(3): 510–514. doi: 10.14107/j.cnki.kzgc.161174

    LI Ruizhen, YANG Huizhen, and XIAO Congshan. Cooperative hunting strategy for multi-mobile robot systems based on dynamic hunting points[J]. Control Engineering of China, 2019, 26(3): 510–514. doi: 10.14107/j.cnki.kzgc.161174
    [5] 王平, 白昕, 解成超. 基于蜂群与A*混合算法的三维多无人机协同[J]. 航天控制, 2019, 37(6): 29–34,65. doi: 10.16804/j.cnki.issn1006-3242.2019.06.006

    WANG Ping, BAI Xin, and XIE Chengchao. 3D Multi-UAV collabaration based on the hybrid algorithm of artificial bee colony and A*[J]. Aerospace Control, 2019, 37(6): 29–34,65. doi: 10.16804/j.cnki.issn1006-3242.2019.06.006
    [6] 董程博, 陈恩民, 杨坤, 等. 多目标点同时到达约束下的集群四维轨迹规划设计[J]. 控制与信息技术, 2019(4): 23–28,38. doi: 10.13889/j.issn.2096-5427.2019.04.005

    DONG Chengbo, CHEN Enmin, YANG Kun, et al. Four-dimensional drone cluster route planning under the constraint of simultaneous multi-obiective arrival[J]. Control and Information Technology, 2019(4): 23–28,38. doi: 10.13889/j.issn.2096-5427.2019.04.005
    [7] 赵明明, 李彬, 王敏立. 不确定信息下基于拍卖算法的多无人机同时到达攻击多目标[J]. 电光与控制, 2015, 22(2): 89–93. doi: 10.3969/j.issn.1671-637X.2015.02.020

    ZHAO Mingming, LI Bin, and WANG Minli. Auction algorithm based Multi-UAV arriving simultaneously to attack multiple targets with uncertain informatio[J]. Electronics Optics &Control, 2015, 22(2): 89–93. doi: 10.3969/j.issn.1671-637X.2015.02.020
    [8] 徐国艳, 宗孝鹏, 余贵珍, 等. 基于DDPG的无人车智能避障方法研究[J]. 汽车工程, 2019, 41(2): 206–212. doi: 10.19562/j.chinasae.qcgc.2019.02.013

    XU Guoyan, ZONG Xiaopeng, YU Guizhen, et al. A research on intelligent obstacle avoidance of unmanned vehicle based on DDPG algorithm[J]. Automotive Engineering, 2019, 41(2): 206–212. doi: 10.19562/j.chinasae.qcgc.2019.02.013
    [9] LILLICRAP T P, HUNT J J, PRITZEL A, et al. Continuous control with deep reinforcement learning[C]. The 4th International Conference on Learning Representations, San Juan, Puerto Rico, 2016: 1–14.
    [10] LOWE R, WU Yi, TAMAR A, et al. Multi-agent actor-critic for mixed cooperative-competitive environments[C]. The 31st International Conference on Neural Information Processing Systems, Long Beach, USA, 2017: 6382–6393.
    [11] 唐伦, 李质萱, 蒲昊, 等. 基于多智能体深度强化学习的无人机动态预部署策略[J]. 电子与信息学报, 2022.

    TANG Lun, LI Zhixuan, PU Hao, et al. A dynamic pre-deployment strategy of uavs based on multi-agent deep reinforcement learning[J]. Journal of Electronics & Information Technology, 2022.
    [12] 张建行, 康凯, 钱骅, 等. 面向物联网的深度Q网络无人机路径规划[J]. 电子与信息学报, 2022, 44(11): 3850–3857. doi: 10.11999/JEIT210962

    ZHANG Jianhang, KANG Kai, QIAN Hua, et al. UAV trajectory planning based on deep Q-network for internet of things[J]. Journal of Electronics &Information Technology, 2022, 44(11): 3850–3857. doi: 10.11999/JEIT210962
    [13] 赵辉, 郝梦雅, 王红君, 等. 基于资源拍卖的农业多机器人任务分配[J]. 计算机应用与软件, 2021, 38(12): 286–290,313. doi: 10.3969/j.issn.1000-386x.2021.12.046

    ZHAO Hui, HAO Mengya, WANG Hongjun, et al. Cooperative task allocation of agricultural multi-robot based on resource auction[J]. Computer Applications and Software, 2021, 38(12): 286–290,313. doi: 10.3969/j.issn.1000-386x.2021.12.046
    [14] ELGIBREEN H and YOUCEF-TOUMI K. Dynamic task allocation in an uncertain environment with heterogeneous multi-agents[J]. Autonomous Robots, 2019, 43(7): 1639–1664. doi: 10.1007/s10514-018-09820-5
    [15] VAN HASSELT H. Double Q-learning[C]. The 23rd International Conference on Neural Information Processing Systems, Vancouver, Canada, 2010: 2613–2621.
    [16] 万逸飞, 彭力. 基于协同多目标算法的多机器人路径规划[J]. 信息与控制, 2020, 49(2): 139–146.

    WAN Yifei and PENG Li. Multi-robot path planning based on cooperative multi-objective algorithm[J]. Information and Control. 2020, 49(2): 139–146.
    [17] 陈宝, 田斌, 周占伟, 等. 基于改进遗传算法的多直角贴装机器人路径协同规划[J]. 机械工程与自动化, 2022(5): 57–58,61. doi: 10.3969/j.issn.1672-6413.2022.05.020

    CHEN Bao, TIAN Bin, ZHOU Zhanwei, et al. Path collaborative planning of multi right angle mounting robot based on improved genetic algorithm[J]. Mechanical Engineering &Automation, 2022(5): 57–58,61. doi: 10.3969/j.issn.1672-6413.2022.05.020
  • 加载中
图(12) / 表(5)
计量
  • 文章访问数:  394
  • HTML全文浏览量:  181
  • PDF下载量:  53
  • 被引次数: 0
出版历程
  • 收稿日期:  2023-01-02
  • 修回日期:  2023-05-12
  • 网络出版日期:  2023-05-22
  • 刊出日期:  2024-01-17

目录

    /

    返回文章
    返回