高级搜索

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

深度强化学习赋能的无人机集群仿生行为建模方法

何明 吴晶晶 韩伟 刘思聪 潘璠 夏恒煜

何明, 吴晶晶, 韩伟, 刘思聪, 潘璠, 夏恒煜. 深度强化学习赋能的无人机集群仿生行为建模方法[J]. 电子与信息学报. doi: 10.11999/JEIT251103
引用本文: 何明, 吴晶晶, 韩伟, 刘思聪, 潘璠, 夏恒煜. 深度强化学习赋能的无人机集群仿生行为建模方法[J]. 电子与信息学报. doi: 10.11999/JEIT251103
HE Ming, WU Jingjing, HAN Wei, LIU Sicong, PAN Pan, XIA Hengyu. Bionic Behavior Modeling Method for Unmanned Aerial Vehicle Swarms Empowered by Deep Reinforcement Learning[J]. Journal of Electronics & Information Technology. doi: 10.11999/JEIT251103
Citation: HE Ming, WU Jingjing, HAN Wei, LIU Sicong, PAN Pan, XIA Hengyu. Bionic Behavior Modeling Method for Unmanned Aerial Vehicle Swarms Empowered by Deep Reinforcement Learning[J]. Journal of Electronics & Information Technology. doi: 10.11999/JEIT251103

深度强化学习赋能的无人机集群仿生行为建模方法

doi: 10.11999/JEIT251103 cstr: 32379.14.JEIT251103
基金项目: 国家自然科学基金(62273356),国家人才项目(2022-JCJQ-ZQ-001),国家重点研发计划(2024YFF140140)
详细信息
    作者简介:

    何明:男,博士,教授,人工智能技术,群体智能控制

    吴晶晶:男,硕士生,研究方向为无人集群指挥控制

    韩伟:女,博士,研究方向为无人化指挥控制

    刘思聪:男,博士生,研究方向为群体智能及无人集群敏捷控制

    潘璠:男,博士,副研究员,研究方向为网络信息安全

    夏恒煜:男,硕士生,研究方向为无人集群指挥控制

    通讯作者:

    吴晶晶 wujingjing7@126.com

  • 中图分类号: TP183

Bionic Behavior Modeling Method for Unmanned Aerial Vehicle Swarms Empowered by Deep Reinforcement Learning

Funds: The National Natural Science Foundation of China (62273356), The National Talent Program (2022-JCJQ-ZQ-001),The National Key Research and Development Program(2024YFF140140)
  • 摘要: 该文针对生物群体协同行为向无人机集群工程模型转化的难题,结合群体仿生智能与深度强化学习(BSI-DRL)的融合演进趋势,聚焦仿生映射理论与建模方法创新,梳理BSI-DRL驱动的无人机集群建模进展与挑战。首先,明确群体仿生智能概念与核心特征,分析其3阶段发展范式跃迁及技术价值,解析4类典型生物群体协同机制,提炼仿生映射3关键步骤;其次,围绕BSI-DRL核心范式,综述仿生规则参数化DRL优化、仿生规则生成式多智能体强化学习、动态角色分配与分层DRL协同优化3大方向的技术优势与挑战,并开展横向对比;最后,展望跨物种生物机制融合、BSI-DRL闭环协同、仿鸟群相变控制与DRL融合等未来方向,为技术工程化落地提供理论支撑。
  • 图  1  概念关系及相互作用

    图  2  BSI在无人机集群中的应用范式跃迁[7]

    图  3  鸽群层级交互机制

    图  4  狼群捕猎机制概念图

    图  5  群体仿生智能映射方法论框架

    图  6  无人机集群仿鸟群行为相变控制部分成果[4,4043]

    图  7  3类BSI-DRL的无人机集群建模范式[48]

    表  1  BSI-DRL 建模方法量化对比表

    仿生规则参数化优化 仿生规则生成式 MADRL 动态角色分配与分层协同
    Q-learning优化
    Boids参数[53]
    复合人工势场
    DQN[57]
    集中训练布式
    执行改进[78]
    生存目标驱动DRL[63] 裂变-融合强化
    学习对抗[77]
    中央任务分配
    子代理执行[73]
    性能指标 实现连续避障、空间覆盖最大化;成功规避障碍,维持个体间最优间距; 最小安全距离
    ≥20 m,对比VAPF提升30%;轨迹偏差累计降低25%
    协同作战任务完成率超 91%;集群内碰撞率<2% 群聚行为涌现率100%;集群内
    碰撞率≈0%
    动态障碍规避成功率100%;母群任务完成率100%; 跟踪持续率≥95%;
    目标获取时间≤72 s
    样本效率 100次训练 训练200轮收敛 训练60000轮收敛 训练步数500000 训练步数500000 未提及
    可扩展性 32架 12架 3架 60架 20架(裂变后
    子群≥3 架)
    8~10架
    通信开销 未提及通信负载
    量化数据
    通信半径100 m;
    分布式通信
    分布式通信 局部交互邻居数≤5 通信负载降低
    50%~85%
    分布式通信和子代理间局部通信
    其他指标 支持编队动态扩张
    与收缩
    轨迹更平滑;避障后快速回归路线; 奖励值稳定在30±5 平均集群内个体间距均匀 极化指数稳定在0.85以上 单任务飞行时间
    缩短
    下载: 导出CSV
  • [1] FAN Ruitao, WANG Jintao, HAN Weixin, et al. UAV swarm control based on hybrid bionic swarm intelligence[J]. Guidance, Navigation and Control, 2023, 3(2): 2350008. doi: 10.1142/S2737480723500085.
    [2] LONG Weifan, HOU Taixian, WEI Xiaoyi, et al. A survey on population-based deep reinforcement learning[J]. Mathematics, 2023, 11(10): 2234. doi: 10.3390/math11102234.
    [3] BENI G and WANG Jing. Swarm intelligence in cellular robotic systems[M]. DARIO P, SANDINI G, and AEBISCHER P. Robots and Biological Systems: Towards a New Bionics?. Berlin: Springer, 1993: 703–712. doi: 10.1007/978-3-642-58069-7_38.
    [4] 何明, 陈浩天, 韩伟, 等. 无人机仿鸟群协同控制发展现状及关键技术[J]. 航空学报, 2024, 45(20): 029946. doi: 10.7527/S1000-6893.2024.29946.

    HE Ming, CHEN Haotian, HAN Wei, et al. Development status and key technologies of cooperative control of bird-inspired UAV swarms[J]. Acta Aeronautica et Astronautica Sinica, 2024, 45(20): 029946. doi: 10.7527/S1000-6893.2024.29946.
    [5] 段海滨, 邵山, 苏丙未, 等. 基于仿生智能的无人作战飞机控制技术发展新思路[J]. 中国科学: 技术科学, 2010, 40(8): 853–860.

    DUAN Haibin, SHAO Shan, SU Bingwei, et al. New development thoughts on the bio-inspired intelligence based control for unmanned combat aerial vehicle[J]. Science China Technological Sciences, 2010, 53(8): 2025–2031. doi: 10.1007/s11431-010-3160-z.
    [6] 邱华鑫, 段海滨, 范彦铭. 基于鸽群行为机制的多无人机自主编队[J]. 控制理论与应用, 2015, 32(10): 1298–1304. doi: 10.7641/CTA.2015.50314.

    QIU Huaxin, DUAN Haibin, and FAN Yanming. Multiple unmanned aerial vehicle autonomous formation based on the behavior mechanism in pigeon flocks[J]. Control Theory & Applications, 2015, 32(10): 1298–1304. doi: 10.7641/CTA.2015.50314.
    [7] 梁鸿涛, 王耀南, 华和安, 等. 无人集群系统深度强化学习控制研究进展[J]. 工程科学学报, 2024, 46(9): 1521–1534. doi: 10.13374/j.issn2095-9389.2023.07.30.001.

    LIANG Hongtao, WANG Yaonan, HUA Hean, et al. Deep reinforcement learning to control an unmanned swarm system[J]. Chinese Journal of Engineering, 2024, 46(9): 1521–1534. doi: 10.13374/j.issn2095-9389.2023.07.30.001.
    [8] NTI I K, ADEKOYA A F, WEYORI B A, et al. Applications of artificial intelligence in engineering and manufacturing: A systematic review[J]. Journal of Intelligent Manufacturing, 2022, 33(6): 1581–1601. doi: 10.1007/s10845-021-01771-6.
    [9] 刘雷, 刘大卫, 王晓光, 等. 无人机集群与反无人机集群发展现状及展望[J]. 航空学报, 2022, 43(S1): 726908. doi: 10.7527/S1000-6893.2022.26908.

    LIU Lei, LIU Dawei, WANG Xiaoguang, et al. Development status and outlook of UAV clusters and anti-UAV clusters[J]. Acta Aeronautica et Astronautica Sinica, 2022, 43(S1): 726908. doi: 10.7527/S1000-6893.2022.26908.
    [10] LIU Yunxiao, WANG Yiming, LI Han, et al. Runway-free recovery methods for fixed-wing UAVs: A comprehensive review[J]. Drones, 2024, 8(9): 463. doi: 10.3390/drones8090463.
    [11] SHAHZAD M M, SAEED Z, AKHTAR A, et al. A review of swarm robotics in a nutshell[J]. Drones, 2023, 7(4): 269. doi: 10.3390/drones7040269.
    [12] ZAITSEVA E, LEVASHENKO V, MUKHAMEDIEV R, et al. Review of reliability assessment methods of drone swarm (fleet) and a new importance evaluation based method of drone swarm structure analysis[J]. Mathematics, 2023, 11(11): 2551. doi: 10.3390/math11112551.
    [13] SANKEY D W E and PORTUGAL S J. Influence of behavioural and morphological group composition on pigeon flocking dynamics[J]. Journal of Experimental Biology, 2023, 226(15): jeb245776. doi: 10.1242/jeb.245776.
    [14] BALLERINI M, CABIBBO N, CANDELIER R, et al. Interaction ruling animal collective behavior depends on topological rather than metric distance: Evidence from a field study[J]. Proceedings of the National Academy of Sciences of the United States of America, 2008, 105(4): 1232–1237. doi: 10.1073/pnas.0711437105.
    [15] 罗琪楠, 段海滨, 范彦铭. 鸽群运动模型稳定性及聚集特性分析[J]. 中国科学: 技术科学, 2019, 49(6): 652–660. doi: 10.1360/N092017-00320.

    LUO Qi'’nan, DUAN Haibin, and FAN Yanming. Analysis on stability and aggregation behavior of pigeon collective model[J]. Scientia Sinica Technologica, 2019, 49(6): 652–660. doi: 10.1360/N092017-00320.
    [16] HUO Mengzhen, DUAN Haibin, and DING Xilun. Manned aircraft and unmanned aerial vehicle heterogeneous formation flight control via heterogeneous pigeon flock consistency[J]. Unmanned Systems, 2021, 9(3): 227–236. doi: 10.1142/S2301385021410053.
    [17] CAO Shiyue, LEE C Y, DUAN Haibin, et al. Quadrotor swarm flight experimentation inspired by pigeon flock topology[C]. 2019 IEEE 15th International Conference on Control and Automation (ICCA), Edinburgh, UK, 2019: 657–662. doi: 10.1109/ICCA.2019.8899745.
    [18] HANG Xu and YIN Wang. Target assignment of heterogeneous multi-UAVs based on pigeon-inspired optimization[C]. Proceedings of 2020 International Conference on Guidance on Advances in Guidance, Tianjin, China, 2020: 3987–3998. doi: 10.1007/978-981-15-8155-7_333.
    [19] PAN Chengsheng, SI Zenghui, DU Xiuli, et al. A four-step decision-making grey wolf optimization algorithm[J]. Soft Computing, 2021, 25(22): 14375–14391. doi: 10.1007/s00500-021-06194-2.
    [20] MADDILETI T, SALENDRA G, and SIVAPPAGARI C M R. Design optimization of power and area of two-stage CMOS operational amplifier utilizing chaos grey wolf technique[J]. International Journal of Advanced Computer Science and Applications, 2020, 11(7): 465–479. doi: 10.14569/IJACSA.2020.0110760.
    [21] KRAIEM H, AYMEN F, YAHYA L, et al. A comparison between particle swarm and grey wolf optimization algorithms for improving the battery autonomy in a photovoltaic system[J]. Applied Sciences, 2021, 11(16): 7732. doi: 10.3390/app11167732.
    [22] BAI Xiaotong, ZHENG Yuefeng, LU Yang, et al. Chain hybrid feature selection algorithm based on improved Grey Wolf Optimization algorithm[J]. PLoS One, 2024, 19(10): e0311602. doi: 10.1371/journal.pone.0311602.
    [23] PHADKE A and MEDRANO F A. Towards resilient UAV swarms–A breakdown of resiliency requirements in UAV swarms[J]. Drones, 2022, 6(11): 340. doi: 10.3390/drones6110340.
    [24] HANG Haotian, HUANG Chenchen, BARNETT A, et al. Self-reorganization and information transfer in massive schools of fish[EB/OL]. https://arxiv.org/abs/2505.05822, 2025.
    [25] WU Husheng, PENG Qiang, SHI Meimei, et al. A survey of UAV swarm task allocation based on the perspective of coalition formation[J]. International Journal of Swarm Intelligence Research, 2022, 13(1): 1–22. doi: 10.4018/IJSIR.311499.
    [26] QIN Boyu, ZHANG Dong, TANG Shuo, et al. Distributed grouping cooperative dynamic task assignment method of UAV swarm[J]. Applied Sciences, 2022, 12(6): 2865. doi: 10.3390/app12062865.
    [27] 陈鹏宇. 基于深度强化学习的集群行为建模研究[D]. [硕士论文], 大连海洋大学, 2023. doi: 10.27821/d.cnki.gdlhy.2023.000385.

    CHEN Pengyu. Research on collective behavior modeling based on deep reinforcement learning[D]. [Master dissertation], Dalian Ocean University, 2023. doi: 10.27821/d.cnki.gdlhy.2023.000385.
    [28] YIN Jia, CHAN Yanghao, DA JORNADA F H, et al. Analyzing and predicting non-equilibrium many-body dynamics via dynamic mode decomposition[J]. Journal of Computational Physics, 2023, 477: 111909. doi: 10.1016/j.jcp.2023.111909.
    [29] HANSEN E, BRUNTON S L, and SONG Zhuoyuan. Swarm modeling with dynamic mode decomposition[J]. IEEE Access, 2022, 10: 59508–59521. doi: 10.1109/ACCESS.2022.3179414.
    [30] FUJII K, KAWASAKI T, INABA Y, et al. Prediction and classification in equation-free collective motion dynamics[J]. PLoS Computational Biology, 2018, 14(11): e1006545. doi: 10.1371/journal.pcbi.1006545.
    [31] XIAO Yandong, LEI Xiaokang, ZHENG Zhicheng, et al. Perception of motion salience shapes the emergence of collective motions[J]. Nature Communications, 2024, 15(1): 4779. doi: 10.1038/s41467-024-49151-x.
    [32] 刘明雍, 雷小康, 杨盼盼, 等. 群集运动的理论建模与实证分析[J]. 科学通报, 2014, 59(25): 2464–2483. doi: 10.1360/N972013-00045.

    LIU Mingyong, LEI Xiaokang, YANG Panpan, et al. Progress of theoretical modelling and empirical studies on collective motion[J]. Chinese Science Bulletin, 2014, 59(25): 2464–2483. doi: 10.1360/N972013-00045.
    [33] COUZIN I D, KRAUSE J, JAMES R, et al. Collective memory and spatial sorting in animal groups[J]. Journal of Theoretical Biology, 2002, 218(1): 1–11. doi: 10.1006/jtbi.2002.3065.
    [34] 邱华鑫, 段海滨, 范彦铭, 等. 鸽群交互模式切换模型及其同步性分析[J]. 智能系统学报, 2020, 15(2): 334–343. doi: 10.11992/tis.201904052.

    QIU Huaxin, DUAN Haibin, FAN Yanming, et al. Pigeon flock interaction pattern switching model and its synchronization analysis[J]. CAAI Transactions on Intelligent Systems, 2020, 15(2): 334–343. doi: 10.11992/tis.201904052.
    [35] VICSEK T, CZIRÓK A, BEN-JACOB E, et al. Novel type of phase transition in a system of self-driven particles[J]. Physical Review Letters, 1995, 75(6): 1226–1229. doi: 10.1103/PhysRevLett.75.1226.
    [36] BUHL C, SUMPTER D J T, COUZIN I D, et al. From disorder to order in marching locusts[J]. Science, 2006, 312(5778): 1402–1406. doi: 10.1126/science.1125142.
    [37] CAVAGNA A and GIARDINA I. Bird flocks as condensed matter[J]. Annual Review of Condensed Matter Physics, 2014, 5: 183–207. doi: 10.1146/annurev-conmatphys-031113-133834.
    [38] QI Jingtao, BAI Liang, WEI Yingmei, et al. Emergence of adaptation of collective behavior based on visual perception[J]. IEEE Internet of Things Journal, 2023, 10(12): 10368–10384. doi: 10.1109/JIOT.2023.3238162.
    [39] ATTANASI A, CAVAGNA A, DEL CASTELLO L, et al. Emergence of collective changes in travel direction of starling flocks from individual birds' fluctuations[J]. Journal of the Royal Society Interface, 2015, 12(108): 20150319. doi: 10.1098/rsif.2015.0319.
    [40] 邱浩楠, 何明, 韩伟, 等. 一种仿鸟群行为的无人机集群相变控制方法[J]. 现代防御技术, 2025, 53(1): 11–22. doi: 10.3969/j.issn.1009-086x.2025.01.002.

    QIU Haonan, HE Ming, HAN Wei, et al. A phase transition control method for UAV swarm based on birds’ behaviors[J]. Modern Defense Technology, 2025, 53(1): 11–22. doi: 10.3969/j.issn.1009-086x.2025.01.002.
    [41] LIU Sicong, HE Ming, HAN Wei, et al. Distributed control algorithm for multi-agent cooperation: Leveraging spatial information perception[J]. International Journal of Robust and Nonlinear Control, 2025, 36(1): 312–328. doi: 10.1002/rnc.70138.
    [42] CHEN Haotian, HE Ming, LIU Jintao, et al. A novel fractional-order flocking algorithm for large-scale UAV swarms[J]. Complex & Intelligent Systems, 2023, 9(6): 6831–6844. doi: 10.1007/s40747-023-01107-2.
    [43] 段海滨, 尤灵辰, 范彦铭, 等. 仿鸟群自推进机制的无人机集群相变控制[J]. 自动化学报, 2025, 51(5): 960–971. doi: 10.16383/j.aas.c240598.

    DUAN Haibin, YOU Lingchen, FAN Yanming, et al. Phase transition control of UAV swarm based on bird-inspired self-propelled mechanism[J]. Acta Automatica Sinica, 2025, 51(5): 960–971. doi: 10.16383/j.aas.c240598.
    [44] WANG Ling and CHEN Guanrong. Synchronization of multi-agent systems with metric-topological interactions[J]. Chaos: An Interdisciplinary Journal of Nonlinear Science, 2016, 26(9): 094809. doi: 10.1063/1.4955086.
    [45] EL-FERIK S. Biologically based control of a fleet of unmanned aerial vehicles facing multiple threats[J]. IEEE Access, 2020, 8: 107146–107160. doi: 10.1109/ACCESS.2020.3000774.
    [46] AZZAM R, BOIKO I, and ZWEIRI Y. Swarm cooperative navigation using centralized training and decentralized execution[J]. Drones, 2023, 7(3): 193. doi: 10.3390/drones7030193.
    [47] 夏家伟, 刘志坤, 朱旭芳, 等. 基于多智能体强化学习的无人艇集群集结方法[J]. 北京航空航天大学学报, 2023, 49(12): 3365–3376. doi: 10.13700/j.bh.1001-5965.2022.0088.

    XIA Jiawei, LIU Zhikun, ZHU Xufang, et al. A coordinated rendezvous method for unmanned surface vehicle swarms based on multi-agent reinforcement learning[J]. Journal of Beijing University of Aeronautics and Astronautics, 2023, 49(12): 3365–3376. doi: 10.13700/j.bh.1001-5965.2022.0088.
    [48] PAPADOPOULOU M, HILDENBRANDT H, and HEMELRIJK C K. Diffusion during collective turns in bird flocks under predation[J]. Frontiers in Ecology and Evolution, 2023, 11: 1198248. doi: 10.3389/fevo.2023.1198248.
    [49] REYNOLDS C W. Flocks, herds and schools: A distributed behavioral model[C].The 14th Annual Conference on Computer Graphics and Interactive Techniques, Anaheim, USA, 1987: 25–34. doi: 10.1145/37401.37406.
    [50] KHATIB O. Real-time obstacle avoidance for manipulators and mobile robots[J]. The International Journal of Robotics Research, 1986, 5(1): 90–98. doi: 10.1177/027836498600500106.
    [51] 朱许, 张博涵, 王正宁, 等. 基于深度强化学习的无人机集群编队避障控制[J]. 飞行力学, 2025, 43(2): 22–28. doi: 10.13645/j.cnki.f.d.20250214.002.

    ZHU Xu, ZHANG Bohan, WANG Zhengning, et al. Obstacle avoidance control of UAV cluster formation based on deep reinforcement learning[J]. Flight Dynamics, 2025, 43(2): 22–28. doi: 10.13645/j.cnki.f.d.20250214.002.
    [52] 陈泽坤, 何杏宇. 一种无人机编队控制方法研究与仿真[J]. 建模与仿真, 2024, 13(3): 2662–2672. doi: 10.12677/mos.2024.133242.

    CHEN Zekun and HE Xingyu. Research and simulation of a UAV formation control method[J]. Modeling and Simulation, 2024, 13(3): 2662–2672. doi: 10.12677/mos.2024.133242.
    [53] DONG Zhaoqi, WU Qizhen, and CHEN Lei. Reinforcement learning-based formation pinning and shape transformation for swarms[J]. Drones, 2023, 7(11): 673. doi: 10.3390/drones7110673.
    [54] WANG Chengjie, DENG Juan, ZHAO Hui, et al. Effect of Q-learning on the evolution of cooperation behavior in collective motion: An improved Vicsek model[J]. Applied Mathematics and Computation, 2024, 482: 128956. doi: 10.1016/j.amc.2024.128956.
    [55] JIN Weiqiang, TIAN Xingwu, SHI Bohang, et al. Enhanced UAV pursuit-evasion using Boids Modelling: A synergistic integration of bird swarm intelligence and DRL[J]. Computers, Materials and Continua, 2024, 80(3): 3523–3553. doi: 10.32604/cmc.2024.055125.
    [56] ZHAO Feifei, ZENG Yi, HAN Bing, et al. Nature-inspired self-organizing collision avoidance for drone swarm based on reward-modulated spiking neural network[J]. Patterns, 2022, 3(11): 100611. doi: 10.1016/J.PATTER.2022.100611.
    [57] 谢觉非. 城市物流场景下基于复合人工势场的无人机避障控制技术研究[D]. [硕士论文], 电子科技大学, 2025. DOI: 10.27005/d.cnki.gdzku.2025.004243.

    XIE Juefei. Research on UAV obstacle avoidance control technology based on composite artificial potential field in urban logistics scenarios[D]. [Master dissertation], University of Electronic Science and Technology of China, 2025. DOI: 10.27005/d.cnki.gdzku.2025.004243.
    [58] ABPEIKAR S, KASMARIK K, and GARRATT M. Reinforcement learning for collective motion tuning in the presence of extrinsic goals[C]. 35th Australasian Joint Conference on Artificial Intelligence, Perth, Australia, 2022: 761–774. doi: 10.1007/978-3-031-22695-3_53.
    [59] ZENG Qingli and NAIT-ABDESSELAM F. Multi-agent reinforcement learning-based extended Boid modeling for drone swarms[C]. ICC 2024-IEEE International Conference on Communications, Denver, USA, 2024: 1551–1556. doi: 10.1109/ICC51166.2024.10622479.
    [60] LIU Zhijun, LI Jie, SHEN Jian, et al. Leader–follower UAVs formation control based on a deep Q-network collaborative framework[J]. Scientific Reports, 2024, 14(1): 4674. doi: 10.1038/s41598-024-54531-w.
    [61] TANG Ruipeng, TANG Jianrui, TALIP M S A, et al. Enhanced multi agent coordination algorithm for drone swarm patrolling in durian orchards[J]. Scientific Reports, 2025, 15(1): 9139. doi: 10.1038/s41598-025-88145-7.
    [62] GUO Yunxiao, XIE Xinjia, ZHAO Runhao, et al. Cooperation and competition: Flocking with evolutionary multi-agent reinforcement learning[C]//International Conference on Neural Information Processing. Cham: Springer International Publishing, 2022: 271–283. doi: 10.1007/978-3-031-30105-6_23.
    [63] HAHN C, PHAN T, GABOR T, et al. Emergent escape-based flocking behavior using multi-agent reinforcement learning[C]. Artificial Life Conference Proceedings, 2019: 598–605. doi: 10.1162/isal_a_00226.
    [64] LAUDENZI G. Multi-agent deep reinforcement learning for drone swarms in static and dynamic environments[D]. [Master dissertation], Università of Bologna, 2024.
    [65] ABPEIKAR S, KASMARIK K, GARRATT M, et al. Automatic collective motion tuning using actor-critic deep reinforcement learning[J]. Swarm and Evolutionary Computation, 2022, 72: 101085. doi: 10.1016/j.swevo.2022.101085.
    [66] WANG Jun, ZHANG Yuchen, HE Leimin, et al. A bio-inspired adaptive formation architecture based on multi-agents with application to UAV swarm[C]. 2024 IEEE International Conference on Unmanned Systems (ICUS), Nanjing, China, 2024: 908–914. doi: 10.1109/ICUS61736.2024.10840152.
    [67] WANG Dongzi, DING Bo, and FENG Dawei. Meta reinforcement learning with generative adversarial reward from expert knowledge[C]. 2020 IEEE 3rd International Conference on Information Systems and Computer Aided Education (ICISCAE), Dalian, China, 2020: 1–7. doi: 10.1109/ICISCAE51034.2020.9236869.
    [68] QIAN Feng, SU Kai, LIANG Xin, et al. Task assignment for UAV swarm saturation attack: A deep reinforcement learning approach[J]. Electronics, 2023, 12(6): 1292. doi: 10.3390/electronics12061292.
    [69] LI Chengshu, ZHANG Ruohan, WONG J, et al. BEHAVIOR-1K: A human-centered, embodied AI benchmark with 1, 000 everyday activities and realistic simulation[EB/OL]. https://arxiv.org/abs/2403.09227, 2024.
    [70] ANTONELO E A, COUTO G C K, and MÖLLER C. Exploring multimodal implicit behavior learning for vehicle navigation in simulated cities[EB/OL]. https://arxiv.org/abs/2509.15400, 2025.
    [71] CHI Pei, WEI Jiahong, WU Kun, et al. A bio-inspired decision-making method of UAV swarm for attack-defense confrontation via multi-agent reinforcement learning[J]. Biomimetics, 2023, 8(2): 222. doi: 10.3390/biomimetics8020222.
    [72] YUE Longfei, YANG Rennong, ZUO Jialiang, et al. Unmanned aerial vehicle swarm cooperative decision-making for SEAD mission: A hierarchical multiagent reinforcement learning approach[J]. IEEE Access, 2022, 10: 92177–92191. doi: 10.1109/ACCESS.2022.3202938.
    [73] ARRANZ R, CARRAMIÑANA D, DE MIGUEL G, et al. Application of deep reinforcement learning to UAV swarming for ground surveillance[J]. Sensors, 2023, 23(21): 8766. doi: 10.3390/s23218766.
    [74] CAI He, MA Fu, NI Ruifeng, et al. Bio-inspired swarm confrontation algorithm for complex hilly terrains[J]. Biomimetics, 2025, 10(5): 257. doi: 10.3390/biomimetics10050257.
    [75] WEI Xiaolong, CUI Wenpeng, HUANG Xianglin, et al. Hierarchical RNNs with graph policy and attention for drone swarm[J]. Journal of Computational Design and Engineering, 2024, 11(2): 314–326. doi: 10.1093/jcde/qwae031.
    [76] TAPPLER M, LOPEZ-MIGUEL I D, TSCHIATSCHEK S, et al. Rule-guided reinforcement learning policy evaluation and improvement[EB/OL]. https://arxiv.org/abs/2503.09270, 2025.
    [77] ZHANG Xiaorong, WANG Yufeng, DING Wenrui, et al. Bio-inspired fission–fusion control and planning of unmanned aerial vehicles swarm systems via reinforcement learning[J]. Applied Sciences, 2024, 14(3): 1192. doi: 10.3390/app14031192.
    [78] XU Dan and CHEN Gang. The research on intelligent cooperative combat of UAV cluster with multi-agent reinforcement learning[J]. Aerospace Systems, 2022, 5(1): 107–121. doi: 10.1007/s42401-021-00105-x.
  • 加载中
图(7) / 表(1)
计量
  • 文章访问数:  38
  • HTML全文浏览量:  18
  • PDF下载量:  4
  • 被引次数: 0
出版历程
  • 收稿日期:  2025-10-16
  • 修回日期:  2025-11-18
  • 录用日期:  2025-12-02
  • 网络出版日期:  2025-12-09

目录

    /

    返回文章
    返回