Advanced Search
Turn off MathJax
Article Contents
HE Ming, WU Jingjing, HAN Wei, LIU Sicong, PAN Pan, XIA Hengyu. Bionic Behavior Modeling Method for Unmanned Aerial Vehicle Swarms Empowered by Deep Reinforcement Learning[J]. Journal of Electronics & Information Technology. doi: 10.11999/JEIT251103
Citation: HE Ming, WU Jingjing, HAN Wei, LIU Sicong, PAN Pan, XIA Hengyu. Bionic Behavior Modeling Method for Unmanned Aerial Vehicle Swarms Empowered by Deep Reinforcement Learning[J]. Journal of Electronics & Information Technology. doi: 10.11999/JEIT251103

Bionic Behavior Modeling Method for Unmanned Aerial Vehicle Swarms Empowered by Deep Reinforcement Learning

doi: 10.11999/JEIT251103 cstr: 32379.14.JEIT251103
Funds:  The National Natural Science Foundation of China (62273356), The National Talent Program (2022-JCJQ-ZQ-001), The National Key Research and Development Program (2024YFF140140)
  • Received Date: 2025-10-16
  • Accepted Date: 2025-12-02
  • Rev Recd Date: 2025-11-18
  • Available Online: 2025-12-09
  •   Significance   Unmanned Aerial Vehicle (UAV) swarm technology is a core driver of low-altitude economic development and intelligent unmanned system evolution, yielding cooperative effects greater than the sum of individual UAVs in disaster response, environmental monitoring, and logistics distribution. As mission scenarios shift toward dynamic heterogeneity, strong interference, and large-scale deployment, traditional centralized control architectures, although theoretically feasible, do not achieve practical implementation and remain a major constraint on engineering application. Bionic Swarm Intelligence (BSI), a distributed intelligent paradigm that simulates the self-organization, elastic reconfiguration, and cooperative behavior of biological swarms, offers a path to overcoming these limitations. The integration of Deep Reinforcement Learning (DRL) enables a transition from static behavior simulation to adaptive autonomous learning and decision-making. The combined BSI-DRL framework allows UAV swarms to optimize cooperative strategies through data-driven interaction, addressing the limited adaptability of manually designed bionic rules. Clarifying the progress and challenges of UAV swarm modeling based on BSI-DRL is essential for supporting engineering transformation and improving practical system performance.   Progress   The progress of BSI-DRL-driven UAV swarm behavior modeling is summarized from four aspects.(1) BSI’s concept and core characteristics: BSI, a biology-oriented subset of Swarm Intelligence (SI), is defined by four characteristics: distributed control without dependence on a central command, self-organization through spontaneous disorder-to-order transition, robustness through functional maintenance under disturbances, and adaptability through dynamic strategy optimization in complex environments. (2) Three-stage paradigm transition of BSI: (a)Before 2010 (rule transplantation stage): work centered on applying fixed bionic algorithms such as particle swarm optimization and biological models (e.g., Boids, Vicsek) to UAV path planning, with SI dependent on preset rules (Fig. 2). (b)From 2010 to 2020 (systematic decentralized control stage): studies shifted toward systematic design and decentralized control theory, enabling a transition from simulation to physical verification but showing limited adaptability under dynamic conditions (Fig. 2). (c)Since 2020 (AI-enhanced autonomous learning stage): integration of DRL enabled a transition to autonomous learning and decision-making, allowing UAV swarms to develop advanced cooperative strategies when facing unknown environments (Fig. 2).(3) Typical biological swarm mechanisms and bionic mapping: Four representative biological mechanisms provide bionic prototypes. (a)Pigeon flock hierarchy, characterized by a three-tier coupled structure, supports formation control and cooperation under interference. (b)Wolf pack hunting, structured as four-stage dynamic collaboration, enables efficient task division. (c)Fish school self-repair through decentralized topology adjustment enhances swarm robustness. (d)Honeybee colony division of labor, based on decentralized decision-making and dynamic role assignment, improves task efficiency. Bionic mapping proceeds through three steps: decomposition of the biological prototype and extraction of behavioral features using dynamic mode decomposition, social interaction filtering, and group state classification (Fig. 5); abstraction of behavior rules and mathematical modeling using approaches such as differential equations and graph theory; and algorithmic adaptation and intelligent enhancement by converting mathematical models into executable rules and integrating DRL.(4) Core BSI-DRL modeling directions: Three main technical paths are summarized with horizontal comparison (Table 1). (a)Bionic-rule parameterization with DRL optimization (shallow fusion): DRL is used to optimize key parameters of bionic models, such as attraction-repulsion weights in Boids, preserving biological robustness but exhibiting instability during large-swarm training. (b)Generative bionic-rule multi-agent reinforcement learning (middle fusion): bio-inspired reward functions guide the autonomous emergence of cooperative rules, improving adaptability but reducing interpretability due to “black-box” characteristics. (c)Dynamic role assignment with hierarchical DRL (deep fusion): a three-tier architecture comprising global planning, group role assignment, and individual execution reduces decision-making complexity in heterogeneous swarms and strengthens multi-task adaptability, although multi-level coordination remains challenging. A scenario-adaptation logic based on swarm scale, environmental dynamics, and task heterogeneity, together with a multi-method fusion strategy, is also proposed.  Conclusions   This study clarifies the theoretical framework and research progress of BSI-DRL-based UAV swarm behavior modeling. BSI addresses limitations of traditional centralized control, including scale expandability, dynamic adaptability, and system credibility, by simulating biological swarm mechanisms. DRL further enables a shift toward autonomous learning. Horizontal comparison indicates complementary strengths across the three core directions: parameterization optimization maintains basic robustness, generative methods enhance dynamic adaptability, and hierarchical collaboration improves performance in heterogeneous multi-task settings. The proposed scenario-adaptation logic, which applies parameterization to small-to-medium and static scenarios, generative methods to medium-to-large and dynamic scenarios, and hierarchical collaboration to heterogeneous multi-task missions, together with the multi-method fusion strategy, offers feasible engineering pathways. Key engineering bottlenecks are also identified, including inconsistent environmental perception, unbalanced multi-objective decision-making, and limited system interpretability, providing a basis for targeted technical advancement.  Prospects  Future work focuses on five directions to enhance the capacity of BSI-DRL for complex UAV swarm tasks. (1)Cross-species biological mechanism integration: combining advantages of different biological prototypes to construct adaptive hybrid systems. (2) BSI-DRL closed-loop collaborative evolution: establishing a bidirectional interaction framework in which BSI provides initial strategies and safety boundaries, while DRL refines bionic rules online. (3)Bird-swarm-like phase-transition control and DRL fusion: using phase-transition order parameters as DRL observation indicators to improve parameter interpretability. (4)Digital-twin and hardware-in-the-loop training and verification: building high-fidelity digital-twin environments to narrow simulation–reality gaps. (5)Real-scenario performance evaluation and field deployment: conducting field tests to assess algorithm effectiveness and guide theoretical refinement.
  • loading
  • [1]
    FAN Ruitao, WANG Jintao, HAN Weixin, et al. UAV swarm control based on hybrid bionic swarm intelligence[J]. Guidance, Navigation and Control, 2023, 3(2): 2350008. doi: 10.1142/S2737480723500085.
    [2]
    LONG Weifan, HOU Taixian, WEI Xiaoyi, et al. A survey on population-based deep reinforcement learning[J]. Mathematics, 2023, 11(10): 2234. doi: 10.3390/math11102234.
    [3]
    BENI G and WANG Jing. Swarm intelligence in cellular robotic systems[M]. DARIO P, SANDINI G, and AEBISCHER P. Robots and Biological Systems: Towards a New Bionics?. Berlin: Springer, 1993: 703–712. doi: 10.1007/978-3-642-58069-7_38.
    [4]
    何明, 陈浩天, 韩伟, 等. 无人机仿鸟群协同控制发展现状及关键技术[J]. 航空学报, 2024, 45(20): 029946. doi: 10.7527/S1000-6893.2024.29946.

    HE Ming, CHEN Haotian, HAN Wei, et al. Development status and key technologies of cooperative control of bird-inspired UAV swarms[J]. Acta Aeronautica et Astronautica Sinica, 2024, 45(20): 029946. doi: 10.7527/S1000-6893.2024.29946.
    [5]
    段海滨, 邵山, 苏丙未, 等. 基于仿生智能的无人作战飞机控制技术发展新思路[J]. 中国科学: 技术科学, 2010, 40(8): 853–860.

    DUAN Haibin, SHAO Shan, SU Bingwei, et al. New development thoughts on the bio-inspired intelligence based control for unmanned combat aerial vehicle[J]. Science China Technological Sciences, 2010, 53(8): 2025–2031. doi: 10.1007/s11431-010-3160-z.
    [6]
    邱华鑫, 段海滨, 范彦铭. 基于鸽群行为机制的多无人机自主编队[J]. 控制理论与应用, 2015, 32(10): 1298–1304. doi: 10.7641/CTA.2015.50314.

    QIU Huaxin, DUAN Haibin, and FAN Yanming. Multiple unmanned aerial vehicle autonomous formation based on the behavior mechanism in pigeon flocks[J]. Control Theory & Applications, 2015, 32(10): 1298–1304. doi: 10.7641/CTA.2015.50314.
    [7]
    梁鸿涛, 王耀南, 华和安, 等. 无人集群系统深度强化学习控制研究进展[J]. 工程科学学报, 2024, 46(9): 1521–1534. doi: 10.13374/j.issn2095-9389.2023.07.30.001.

    LIANG Hongtao, WANG Yaonan, HUA Hean, et al. Deep reinforcement learning to control an unmanned swarm system[J]. Chinese Journal of Engineering, 2024, 46(9): 1521–1534. doi: 10.13374/j.issn2095-9389.2023.07.30.001.
    [8]
    NTI I K, ADEKOYA A F, WEYORI B A, et al. Applications of artificial intelligence in engineering and manufacturing: A systematic review[J]. Journal of Intelligent Manufacturing, 2022, 33(6): 1581–1601. doi: 10.1007/s10845-021-01771-6.
    [9]
    刘雷, 刘大卫, 王晓光, 等. 无人机集群与反无人机集群发展现状及展望[J]. 航空学报, 2022, 43(S1): 726908. doi: 10.7527/S1000-6893.2022.26908.

    LIU Lei, LIU Dawei, WANG Xiaoguang, et al. Development status and outlook of UAV clusters and anti-UAV clusters[J]. Acta Aeronautica et Astronautica Sinica, 2022, 43(S1): 726908. doi: 10.7527/S1000-6893.2022.26908.
    [10]
    LIU Yunxiao, WANG Yiming, LI Han, et al. Runway-free recovery methods for fixed-wing UAVs: A comprehensive review[J]. Drones, 2024, 8(9): 463. doi: 10.3390/drones8090463.
    [11]
    SHAHZAD M M, SAEED Z, AKHTAR A, et al. A review of swarm robotics in a nutshell[J]. Drones, 2023, 7(4): 269. doi: 10.3390/drones7040269.
    [12]
    ZAITSEVA E, LEVASHENKO V, MUKHAMEDIEV R, et al. Review of reliability assessment methods of drone swarm (fleet) and a new importance evaluation based method of drone swarm structure analysis[J]. Mathematics, 2023, 11(11): 2551. doi: 10.3390/math11112551.
    [13]
    SANKEY D W E and PORTUGAL S J. Influence of behavioural and morphological group composition on pigeon flocking dynamics[J]. Journal of Experimental Biology, 2023, 226(15): jeb245776. doi: 10.1242/jeb.245776.
    [14]
    BALLERINI M, CABIBBO N, CANDELIER R, et al. Interaction ruling animal collective behavior depends on topological rather than metric distance: Evidence from a field study[J]. Proceedings of the National Academy of Sciences of the United States of America, 2008, 105(4): 1232–1237. doi: 10.1073/pnas.0711437105.
    [15]
    罗琪楠, 段海滨, 范彦铭. 鸽群运动模型稳定性及聚集特性分析[J]. 中国科学: 技术科学, 2019, 49(6): 652–660. doi: 10.1360/N092017-00320.

    LUO Qi'’nan, DUAN Haibin, and FAN Yanming. Analysis on stability and aggregation behavior of pigeon collective model[J]. Scientia Sinica Technologica, 2019, 49(6): 652–660. doi: 10.1360/N092017-00320.
    [16]
    HUO Mengzhen, DUAN Haibin, and DING Xilun. Manned aircraft and unmanned aerial vehicle heterogeneous formation flight control via heterogeneous pigeon flock consistency[J]. Unmanned Systems, 2021, 9(3): 227–236. doi: 10.1142/S2301385021410053.
    [17]
    CAO Shiyue, LEE C Y, DUAN Haibin, et al. Quadrotor swarm flight experimentation inspired by pigeon flock topology[C]. 2019 IEEE 15th International Conference on Control and Automation (ICCA), Edinburgh, UK, 2019: 657–662. doi: 10.1109/ICCA.2019.8899745.
    [18]
    HANG Xu and YIN Wang. Target assignment of heterogeneous multi-UAVs based on pigeon-inspired optimization[C]. 2020 International Conference on Guidance on Advances in Guidance, Tianjin, China, 2020: 3987–3998. doi: 10.1007/978-981-15-8155-7_333.
    [19]
    PAN Chengsheng, SI Zenghui, DU Xiuli, et al. A four-step decision-making grey wolf optimization algorithm[J]. Soft Computing, 2021, 25(22): 14375–14391. doi: 10.1007/s00500-021-06194-2.
    [20]
    MADDILETI T, SALENDRA G, and SIVAPPAGARI C M R. Design optimization of power and area of two-stage CMOS operational amplifier utilizing chaos grey wolf technique[J]. International Journal of Advanced Computer Science and Applications, 2020, 11(7): 465–479. doi: 10.14569/IJACSA.2020.0110760.
    [21]
    KRAIEM H, AYMEN F, YAHYA L, et al. A comparison between particle swarm and grey wolf optimization algorithms for improving the battery autonomy in a photovoltaic system[J]. Applied Sciences, 2021, 11(16): 7732. doi: 10.3390/app11167732.
    [22]
    BAI Xiaotong, ZHENG Yuefeng, LU Yang, et al. Chain hybrid feature selection algorithm based on improved Grey Wolf Optimization algorithm[J]. PLoS One, 2024, 19(10): e0311602. doi: 10.1371/journal.pone.0311602.
    [23]
    PHADKE A and MEDRANO F A. Towards resilient UAV swarms–A breakdown of resiliency requirements in UAV swarms[J]. Drones, 2022, 6(11): 340. doi: 10.3390/drones6110340.
    [24]
    HANG Haotian, HUANG Chenchen, BARNETT A, et al. Self-reorganization and information transfer in massive schools of fish[EB/OL]. https://arxiv.org/abs/2505.05822, 2025.
    [25]
    WU Husheng, PENG Qiang, SHI Meimei, et al. A survey of UAV swarm task allocation based on the perspective of coalition formation[J]. International Journal of Swarm Intelligence Research, 2022, 13(1): 1–22. doi: 10.4018/IJSIR.311499.
    [26]
    QIN Boyu, ZHANG Dong, TANG Shuo, et al. Distributed grouping cooperative dynamic task assignment method of UAV swarm[J]. Applied Sciences, 2022, 12(6): 2865. doi: 10.3390/app12062865.
    [27]
    陈鹏宇. 基于深度强化学习的集群行为建模研究[D]. [硕士论文], 大连海洋大学, 2023. doi: 10.27821/d.cnki.gdlhy.2023.000385.

    CHEN Pengyu. Research on collective behavior modeling based on deep reinforcement learning[D]. [Master dissertation], Dalian Ocean University, 2023. doi: 10.27821/d.cnki.gdlhy.2023.000385.
    [28]
    YIN Jia, CHAN Yanghao, DA JORNADA F H, et al. Analyzing and predicting non-equilibrium many-body dynamics via dynamic mode decomposition[J]. Journal of Computational Physics, 2023, 477: 111909. doi: 10.1016/j.jcp.2023.111909.
    [29]
    HANSEN E, BRUNTON S L, and SONG Zhuoyuan. Swarm modeling with dynamic mode decomposition[J]. IEEE Access, 2022, 10: 59508–59521. doi: 10.1109/ACCESS.2022.3179414.
    [30]
    FUJII K, KAWASAKI T, INABA Y, et al. Prediction and classification in equation-free collective motion dynamics[J]. PLoS Computational Biology, 2018, 14(11): e1006545. doi: 10.1371/journal.pcbi.1006545.
    [31]
    XIAO Yandong, LEI Xiaokang, ZHENG Zhicheng, et al. Perception of motion salience shapes the emergence of collective motions[J]. Nature Communications, 2024, 15(1): 4779. doi: 10.1038/s41467-024-49151-x.
    [32]
    刘明雍, 雷小康, 杨盼盼, 等. 群集运动的理论建模与实证分析[J]. 科学通报, 2014, 59(25): 2464–2483. doi: 10.1360/N972013-00045.

    LIU Mingyong, LEI Xiaokang, YANG Panpan, et al. Progress of theoretical modelling and empirical studies on collective motion[J]. Chinese Science Bulletin, 2014, 59(25): 2464–2483. doi: 10.1360/N972013-00045.
    [33]
    COUZIN I D, KRAUSE J, JAMES R, et al. Collective memory and spatial sorting in animal groups[J]. Journal of Theoretical Biology, 2002, 218(1): 1–11. doi: 10.1006/jtbi.2002.3065.
    [34]
    邱华鑫, 段海滨, 范彦铭, 等. 鸽群交互模式切换模型及其同步性分析[J]. 智能系统学报, 2020, 15(2): 334–343. doi: 10.11992/tis.201904052.

    QIU Huaxin, DUAN Haibin, FAN Yanming, et al. Pigeon flock interaction pattern switching model and its synchronization analysis[J]. CAAI Transactions on Intelligent Systems, 2020, 15(2): 334–343. doi: 10.11992/tis.201904052.
    [35]
    VICSEK T, CZIRÓK A, BEN-JACOB E, et al. Novel type of phase transition in a system of self-driven particles[J]. Physical Review Letters, 1995, 75(6): 1226–1229. doi: 10.1103/PhysRevLett.75.1226.
    [36]
    BUHL C, SUMPTER D J T, COUZIN I D, et al. From disorder to order in marching locusts[J]. Science, 2006, 312(5778): 1402–1406. doi: 10.1126/science.1125142.
    [37]
    CAVAGNA A and GIARDINA I. Bird flocks as condensed matter[J]. Annual Review of Condensed Matter Physics, 2014, 5: 183–207. doi: 10.1146/annurev-conmatphys-031113-133834.
    [38]
    QI Jingtao, BAI Liang, WEI Yingmei, et al. Emergence of adaptation of collective behavior based on visual perception[J]. IEEE Internet of Things Journal, 2023, 10(12): 10368–10384. doi: 10.1109/JIOT.2023.3238162.
    [39]
    ATTANASI A, CAVAGNA A, DEL CASTELLO L, et al. Emergence of collective changes in travel direction of starling flocks from individual birds' fluctuations[J]. Journal of the Royal Society Interface, 2015, 12(108): 20150319. doi: 10.1098/rsif.2015.0319.
    [40]
    邱浩楠, 何明, 韩伟, 等. 一种仿鸟群行为的无人机集群相变控制方法[J]. 现代防御技术, 2025, 53(1): 11–22. doi: 10.3969/j.issn.1009-086x.2025.01.002.

    QIU Haonan, HE Ming, HAN Wei, et al. A phase transition control method for UAV swarm based on birds’ behaviors[J]. Modern Defense Technology, 2025, 53(1): 11–22. doi: 10.3969/j.issn.1009-086x.2025.01.002.
    [41]
    LIU Sicong, HE Ming, HAN Wei, et al. Distributed control algorithm for multi-agent cooperation: Leveraging spatial information perception[J]. International Journal of Robust and Nonlinear Control, 2025, 36(1): 312–328. doi: 10.1002/rnc.70138.
    [42]
    CHEN Haotian, HE Ming, LIU Jintao, et al. A novel fractional-order flocking algorithm for large-scale UAV swarms[J]. Complex & Intelligent Systems, 2023, 9(6): 6831–6844. doi: 10.1007/s40747-023-01107-2.
    [43]
    段海滨, 尤灵辰, 范彦铭, 等. 仿鸟群自推进机制的无人机集群相变控制[J]. 自动化学报, 2025, 51(5): 960–971. doi: 10.16383/j.aas.c240598.

    DUAN Haibin, YOU Lingchen, FAN Yanming, et al. Phase transition control of UAV swarm based on bird-inspired self-propelled mechanism[J]. Acta Automatica Sinica, 2025, 51(5): 960–971. doi: 10.16383/j.aas.c240598.
    [44]
    WANG Ling and CHEN Guanrong. Synchronization of multi-agent systems with metric-topological interactions[J]. Chaos: An Interdisciplinary Journal of Nonlinear Science, 2016, 26(9): 094809. doi: 10.1063/1.4955086.
    [45]
    EL-FERIK S. Biologically based control of a fleet of unmanned aerial vehicles facing multiple threats[J]. IEEE Access, 2020, 8: 107146–107160. doi: 10.1109/ACCESS.2020.3000774.
    [46]
    AZZAM R, BOIKO I, and ZWEIRI Y. Swarm cooperative navigation using centralized training and decentralized execution[J]. Drones, 2023, 7(3): 193. doi: 10.3390/drones7030193.
    [47]
    夏家伟, 刘志坤, 朱旭芳, 等. 基于多智能体强化学习的无人艇集群集结方法[J]. 北京航空航天大学学报, 2023, 49(12): 3365–3376. doi: 10.13700/j.bh.1001-5965.2022.0088.

    XIA Jiawei, LIU Zhikun, ZHU Xufang, et al. A coordinated rendezvous method for unmanned surface vehicle swarms based on multi-agent reinforcement learning[J]. Journal of Beijing University of Aeronautics and Astronautics, 2023, 49(12): 3365–3376. doi: 10.13700/j.bh.1001-5965.2022.0088.
    [48]
    PAPADOPOULOU M, HILDENBRANDT H, and HEMELRIJK C K. Diffusion during collective turns in bird flocks under predation[J]. Frontiers in Ecology and Evolution, 2023, 11: 1198248. doi: 10.3389/fevo.2023.1198248.
    [49]
    REYNOLDS C W. Flocks, herds and schools: A distributed behavioral model[C].The 14th Annual Conference on Computer Graphics and Interactive Techniques, Anaheim, USA, 1987: 25–34. doi: 10.1145/37401.37406.
    [50]
    KHATIB O. Real-time obstacle avoidance for manipulators and mobile robots[J]. The International Journal of Robotics Research, 1986, 5(1): 90–98. doi: 10.1177/027836498600500106.
    [51]
    朱许, 张博涵, 王正宁, 等. 基于深度强化学习的无人机集群编队避障控制[J]. 飞行力学, 2025, 43(2): 22–28. doi: 10.13645/j.cnki.f.d.20250214.002.

    ZHU Xu, ZHANG Bohan, WANG Zhengning, et al. Obstacle avoidance control of UAV cluster formation based on deep reinforcement learning[J]. Flight Dynamics, 2025, 43(2): 22–28. doi: 10.13645/j.cnki.f.d.20250214.002.
    [52]
    陈泽坤, 何杏宇. 一种无人机编队控制方法研究与仿真[J]. 建模与仿真, 2024, 13(3): 2662–2672. doi: 10.12677/mos.2024.133242.

    CHEN Zekun and HE Xingyu. Research and simulation of a UAV formation control method[J]. Modeling and Simulation, 2024, 13(3): 2662–2672. doi: 10.12677/mos.2024.133242.
    [53]
    DONG Zhaoqi, WU Qizhen, and CHEN Lei. Reinforcement learning-based formation pinning and shape transformation for swarms[J]. Drones, 2023, 7(11): 673. doi: 10.3390/drones7110673.
    [54]
    WANG Chengjie, DENG Juan, ZHAO Hui, et al. Effect of Q-learning on the evolution of cooperation behavior in collective motion: An improved Vicsek model[J]. Applied Mathematics and Computation, 2024, 482: 128956. doi: 10.1016/j.amc.2024.128956.
    [55]
    JIN Weiqiang, TIAN Xingwu, SHI Bohang, et al. Enhanced UAV pursuit-evasion using Boids Modelling: A synergistic integration of bird swarm intelligence and DRL[J]. Computers, Materials and Continua, 2024, 80(3): 3523–3553. doi: 10.32604/cmc.2024.055125.
    [56]
    ZHAO Feifei, ZENG Yi, HAN Bing, et al. Nature-inspired self-organizing collision avoidance for drone swarm based on reward-modulated spiking neural network[J]. Patterns, 2022, 3(11): 100611. doi: 10.1016/J.PATTER.2022.100611.
    [57]
    谢觉非. 城市物流场景下基于复合人工势场的无人机避障控制技术研究[D]. [硕士论文], 电子科技大学, 2025. DOI: 10.27005/d.cnki.gdzku.2025.004243.

    XIE Juefei. Research on UAV obstacle avoidance control technology based on composite artificial potential field in urban logistics scenarios[D]. [Master dissertation], University of Electronic Science and Technology of China, 2025. DOI: 10.27005/d.cnki.gdzku.2025.004243.
    [58]
    ABPEIKAR S, KASMARIK K, and GARRATT M. Reinforcement learning for collective motion tuning in the presence of extrinsic goals[C]. 35th Australasian Joint Conference on Artificial Intelligence, Perth, Australia, 2022: 761–774. doi: 10.1007/978-3-031-22695-3_53.
    [59]
    ZENG Qingli and NAIT-ABDESSELAM F. Multi-agent reinforcement learning-based extended Boid modeling for drone swarms[C]. ICC 2024-IEEE International Conference on Communications, Denver, USA, 2024: 1551–1556. doi: 10.1109/ICC51166.2024.10622479.
    [60]
    LIU Zhijun, LI Jie, SHEN Jian, et al. Leader–follower UAVs formation control based on a deep Q-network collaborative framework[J]. Scientific Reports, 2024, 14(1): 4674. doi: 10.1038/s41598-024-54531-w.
    [61]
    TANG Ruipeng, TANG Jianrui, TALIP M S A, et al. Enhanced multi agent coordination algorithm for drone swarm patrolling in durian orchards[J]. Scientific Reports, 2025, 15(1): 9139. doi: 10.1038/s41598-025-88145-7.
    [62]
    GUO Yunxiao, XIE Xinjia, ZHAO Runhao, et al. Cooperation and competition: Flocking with evolutionary multi-agent reinforcement learning[C]. International Conference on Neural Information Processing. Cham: Springer International Publishing, 2022: 271–283. doi: 10.1007/978-3-031-30105-6_23.
    [63]
    HAHN C, PHAN T, GABOR T, et al. Emergent escape-based flocking behavior using multi-agent reinforcement learning[C]. Artificial Life Conference Proceedings, 2019: 598–605. doi: 10.1162/isal_a_00226.
    [64]
    LAUDENZI G. Multi-agent deep reinforcement learning for drone swarms in static and dynamic environments[D]. [Master dissertation], Università of Bologna, 2024.
    [65]
    ABPEIKAR S, KASMARIK K, GARRATT M, et al. Automatic collective motion tuning using actor-critic deep reinforcement learning[J]. Swarm and Evolutionary Computation, 2022, 72: 101085. doi: 10.1016/j.swevo.2022.101085.
    [66]
    WANG Jun, ZHANG Yuchen, HE Leimin, et al. A bio-inspired adaptive formation architecture based on multi-agents with application to UAV swarm[C]. 2024 IEEE International Conference on Unmanned Systems (ICUS), Nanjing, China, 2024: 908–914. doi: 10.1109/ICUS61736.2024.10840152.
    [67]
    WANG Dongzi, DING Bo, and FENG Dawei. Meta reinforcement learning with generative adversarial reward from expert knowledge[C]. 2020 IEEE 3rd International Conference on Information Systems and Computer Aided Education (ICISCAE), Dalian, China, 2020: 1–7. doi: 10.1109/ICISCAE51034.2020.9236869.
    [68]
    QIAN Feng, SU Kai, LIANG Xin, et al. Task assignment for UAV swarm saturation attack: A deep reinforcement learning approach[J]. Electronics, 2023, 12(6): 1292. doi: 10.3390/electronics12061292.
    [69]
    LI Chengshu, ZHANG Ruohan, WONG J, et al. BEHAVIOR-1K: A human-centered, embodied AI benchmark with 1, 000 everyday activities and realistic simulation[EB/OL]. https://arxiv.org/abs/2403.09227, 2024.
    [70]
    ANTONELO E A, COUTO G C K, and MÖLLER C. Exploring multimodal implicit behavior learning for vehicle navigation in simulated cities[EB/OL]. https://arxiv.org/abs/2509.15400, 2025.
    [71]
    CHI Pei, WEI Jiahong, WU Kun, et al. A bio-inspired decision-making method of UAV swarm for attack-defense confrontation via multi-agent reinforcement learning[J]. Biomimetics, 2023, 8(2): 222. doi: 10.3390/biomimetics8020222.
    [72]
    YUE Longfei, YANG Rennong, ZUO Jialiang, et al. Unmanned aerial vehicle swarm cooperative decision-making for SEAD mission: A hierarchical multiagent reinforcement learning approach[J]. IEEE Access, 2022, 10: 92177–92191. doi: 10.1109/ACCESS.2022.3202938.
    [73]
    ARRANZ R, CARRAMIÑANA D, DE MIGUEL G, et al. Application of deep reinforcement learning to UAV swarming for ground surveillance[J]. Sensors, 2023, 23(21): 8766. doi: 10.3390/s23218766.
    [74]
    CAI He, MA Fu, NI Ruifeng, et al. Bio-inspired swarm confrontation algorithm for complex hilly terrains[J]. Biomimetics, 2025, 10(5): 257. doi: 10.3390/biomimetics10050257.
    [75]
    WEI Xiaolong, CUI Wenpeng, HUANG Xianglin, et al. Hierarchical RNNs with graph policy and attention for drone swarm[J]. Journal of Computational Design and Engineering, 2024, 11(2): 314–326. doi: 10.1093/jcde/qwae031.
    [76]
    TAPPLER M, LOPEZ-MIGUEL I D, TSCHIATSCHEK S, et al. Rule-guided reinforcement learning policy evaluation and improvement[EB/OL]. https://arxiv.org/abs/2503.09270, 2025.
    [77]
    ZHANG Xiaorong, WANG Yufeng, DING Wenrui, et al. Bio-inspired fission–fusion control and planning of unmanned aerial vehicles swarm systems via reinforcement learning[J]. Applied Sciences, 2024, 14(3): 1192. doi: 10.3390/app14031192.
    [78]
    XU Dan and CHEN Gang. The research on intelligent cooperative combat of UAV cluster with multi-agent reinforcement learning[J]. Aerospace Systems, 2022, 5(1): 107–121. doi: 10.1007/s42401-021-00105-x.
  • 加载中

Catalog

    通讯作者: 陈斌, bchen63@163.com
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Figures(7)  / Tables(1)

    Article Metrics

    Article views (55) PDF downloads(9) Cited by()
    Proportional views
    Related

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return