高级搜索

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

未知博弈范式下的通用电磁对抗策略自动生成研究

汪清 陈琪 王浩智 张峰 董志诚

汪清, 陈琪, 王浩智, 张峰, 董志诚. 未知博弈范式下的通用电磁对抗策略自动生成研究[J]. 电子与信息学报, 2023, 45(11): 4072-4082. doi: 10.11999/JEIT230848
引用本文: 汪清, 陈琪, 王浩智, 张峰, 董志诚. 未知博弈范式下的通用电磁对抗策略自动生成研究[J]. 电子与信息学报, 2023, 45(11): 4072-4082. doi: 10.11999/JEIT230848
WANG Qing, CHEN Qi, WANG Haozhi, ZHANG Feng, DONG Zhicheng. Automatic Generation of General Electromagnetic Countermeasures under an Unknown Game Paradigm[J]. Journal of Electronics & Information Technology, 2023, 45(11): 4072-4082. doi: 10.11999/JEIT230848
Citation: WANG Qing, CHEN Qi, WANG Haozhi, ZHANG Feng, DONG Zhicheng. Automatic Generation of General Electromagnetic Countermeasures under an Unknown Game Paradigm[J]. Journal of Electronics & Information Technology, 2023, 45(11): 4072-4082. doi: 10.11999/JEIT230848

未知博弈范式下的通用电磁对抗策略自动生成研究

doi: 10.11999/JEIT230848
基金项目: 国家自然科学基金(61871282, U20A20162),西藏自治区重点研发计划,平安西藏重大专项(XZ202201ZD0006G03)
详细信息
    作者简介:

    汪清:女,教授,研究方向为新体制雷达、高速高可靠通信、智能信息处理等

    陈琪:女,硕士生,研究方向为智能信号处理、强化学习

    王浩智:男,博士生,研究方向为智能信号处理、强化学习

    张峰:男,高级工程师,研究方向为智能信息处理

    董志诚:男,教授,研究方向为智能信息处理、高移动性无线通信等

    通讯作者:

    董志诚  dongzc666@163.com

  • 中图分类号: TN794

Automatic Generation of General Electromagnetic Countermeasures under an Unknown Game Paradigm

Funds: The National Natural Science Foundation of China (61871282, U20A20162), The Key Research and Development Program of Xizang Autonomous Region, and the Science and Technology Major Project of Xizang Autonomous Region of China (XZ202201ZD0006G03)
  • 摘要: 电磁空间对抗通常被建模为零和博弈,但是当作战环境变化时零和博弈双方需要适应新的未知任务,人工设定的博弈规则不再适用。为了避免人为设计显式博弈策略,该文提出一种基于种群的多智能体电磁对抗方法(PMAEC),以实现在未知博弈范式下的通用电磁对抗策略自动生成。首先,基于模拟电磁博弈对抗环境的多智能体对抗平台(MaCA),采用元博弈框架建立电磁对抗策略种群优化问题模型,并将其分解为内部和外部优化目标。其次,结合元学习技术,通过自动课程学习(ACL)优化元求解器模型。最后,通过迭代更新最佳响应策略,扩充并强化策略种群以适应不同难度的博弈挑战。在MaCA平台上的仿真结果表明,所提的PMAEC方法能使元博弈收敛到更低的可利用度,并且训练得到的电磁对抗策略种群可以泛化到更复杂的零和博弈,实现模型从简单场景训练扩展至复杂电磁对抗环境的大规模博弈,增强电磁对抗策略的泛化能力。
  • 图  1  MaCA电磁对抗作战场景示意图

    图  2  PMAEC算法流程图

    图  3  基于GRU的元求解器结构

    图  4  异构地图对抗环境训练阶段可利用度变化

    图  5  同构地图对抗环境训练阶段可利用度变化

    图  6  异构地图在不同策略维度测试的最终可利用度

    图  7  同构地图在不同策略维度测试的最终可利用度

    图  8  不同维度下异构地图泛化到同构地图的最终可利用度

    图  9  异构地图反向传播窗口大小对最佳可利用度的影响

    图  10  同构地图反向传播窗口大小对最佳可利用度的影响

    图  11  异构地图环境下不同模型对可利用度的影响

    图  12  同构地图环境下不同模型对可利用度的影响

    算法1 基于种群的多智能体电磁对抗算法(PMAEC)
     (1) 给定博弈分布 $ p(\mathcal{G}) $,学习率 $ \eta $和 $ \mu $,时间窗 $ T $,初始化策略池
      ${\varPhi ^0}$和元求解器参数 $ \theta $
     (2) for每次训练迭代
     (3)  从 $ p(\mathcal{G}) $中采样 $ K $个博弈
     (4)    for每个博弈
     (5)     for时间窗内每个时刻
     (6)     计算 $ t - 1 $时刻的元策略 ${\sigma ^{t - 1} } = {f_\theta }({ {\boldsymbol{M} }^{t - 1} })$
     (7)     初始化最佳响应策略 $ {\phi ^0} $
     (8)     根据式(23)计算最佳响应 $\phi _{{\rm{BR}}}^t$
     (9)     将 $\phi _{{\rm{BR}}}^t$添加至种群 ${\varPhi ^t} = {\varPhi ^{t - 1} } \cup \phi _{{\rm{BR}}}^t$
     (10)     end for
     (11)    计算 $ T $时刻的元策略 ${\sigma^T} = {f_\theta }({ {\boldsymbol{M} }^T})$
     (12)    根据式(1)计算可利用度 ${\rm{Exp} }\left( {\sigma _\theta ^T,\varPhi _\theta ^T} \right)$
     (13)    end for
     (14)   根据式(10)、式(24) 、式(25)计算元梯度 ${\nabla _\theta }\mathcal{J}_{{\rm{out}}}^k(\theta )$
     (15)   根据式(27)更新元求解器参数 $ \theta $
     (16) end for
     (17) 储存当前元求解器模型 $ {f_\theta } $
    下载: 导出CSV

    表  1  PMAEC算法参数列表

    参数名称 数值
    外部学习率 $ \mu $ 0.01
    元训练迭代次数 100
    元批尺寸 5
    模型类型 GRU
    窗口大小 [1, 3, 5]
    内部学习率 $ \eta $ 25.0
    内部梯度迭代次数 5
    可利用度学习率 10.0
    内部可利用度迭代次数 20
    下载: 导出CSV

    表  2  可利用度均值和标准差对比

    算法 异构地图 同构地图
    均值 标准差 均值 标准差
    PMAEC 0.0017 0.0064 0.0186 0.0079
    PSRO 0.0464 0.013 0.056 0.015
    Self-Play 0.0465 0.0133 0.0558 0.0151
    下载: 导出CSV
  • [1] QUAN Siji, QIAN Weiping, GUQ J, et al. Radar-communication integration: An overview[C]. The 7th IEEE/International Conference on Advanced Infocomm Technology, Fuzhou, China, 2014: 98–103.
    [2] 李光久, 李昕. 博弈论简明教程[M]. 镇江: 江苏大学出版社, 2013.

    LI Guangjiu and LI Xin. A Brief Tutorial on Game Theory[M]. Zhenjiang: Jiangsu University Press, 2013.
    [3] MUKHERJEE A and SWINDLEHURST A L. Jamming games in the MIMO wiretap channel with an active eavesdropper[J]. IEEE Transactions on Signal Processing, 2013, 61(1): 82–91. doi: 10.1109/TSP.2012.2222386
    [4] NGUYEN D N and KRUNZ M. Power minimization in MIMO cognitive networks using beamforming games[J]. IEEE Journal on Selected Areas in Communications, 2013, 31(5): 916–925. doi: 10.1109/JSAC.2013.130510
    [5] DELIGIANNIS A, PANOUI A, LAMBOTHARAN S, et al. Game-theoretic power allocation and the Nash equilibrium analysis for a multistatic MIMO radar network[J]. IEEE Transactions on Signal Processing, 2017, 65(24): 6397–6408. doi: 10.1109/TSP.2017.2755591
    [6] LIU Pengfei, WANG Lei, SHAN Zhao, et al. A dynamic game strategy for radar screening pulse width allocation against jamming using reinforcement learning[J]. IEEE Transactions on Aerospace and Electronic Systems, 2023, 59(5): 6059–6072.
    [7] LI Kang, JIU Bo, PU Wenqiang, et al. Neural fictitious self-play for radar antijamming dynamic game with imperfect information[J]. IEEE Transactions on Aerospace and Electronic Systems, 2022, 58(6): 5533–5547. doi: 10.1109/TAES.2022.3175186
    [8] ZHAO Chenyu, WANG Qing, LIU Xiaofeng, et al. Reinforcement learning based a non-zero-sum game for secure transmission against smart jamming[J]. Digital Signal Processing, 2021, 112: 103002. doi: 10.1016/j.dsp.2021.103002
    [9] LI Wen, CHEN Jin, LIU Xin, et al. Intelligent dynamic spectrum anti-jamming communications: a deep reinforcement learning perspective[J]. IEEE Wireless Communications, 2022, 29(5): 60–67. doi: 10.1109/MWC.103.2100365
    [10] LECUN Y, BENGIO Y, and HINTON G. Deep learning[J]. Nature, 2015, 521(7553): 436–444. doi: 10.1038/nature14539
    [11] ANDREW A M. Reinforcement learning: An introduction by Richard S. Sutton and Andrew G. Barto, Adaptive computation and machine learning series, MIT Press (Bradford book), Cambridge, Mass., 1998, xviii + 322 pp, ISBN 0-262-19398-1, (hardback, £31.95)[J].Robotica, 1999, 17(2): 229–235.
    [12] WANG Haozhi, WANG Qing, and CHEN Qi. Opponent’s dynamic prediction model-based power control scheme in secure transmission and smart jamming game[J]. IEEE Internet of Things Journal, To be published.
    [13] KRISHNAMURTHY V, ANGLEY D, EVANS R, et al. Identifying cognitive radars-inverse reinforcement learning using revealed preferences[J]. IEEE Transactions on Signal Processing, 2020, 68: 4529–4542. doi: 10.1109/TSP.2020.3013516
    [14] PATTANAYAK K, KRISHNAMURTHY V, and BERRY C. How can a cognitive radar mask its cognition?[C]. 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Singapore, 2022: 5897–5901.
    [15] LAURENT G J, MATIGNON L, LE FORT-PIAT N, et al. The world of independent learners is not Markovian[J]. International Journal of Knowledge-based and Intelligent Engineering Systems, 2011, 15(1): 55–64. doi: 10.3233/KES-2010-0206
    [16] LEIBO J Z, HUGHES E, LANCTOT M, et al. Autocurricula and the emergence of innovation from social interaction: A manifesto for multi-agent intelligence research[EB/OL].https://arxiv.org/abs/1903.00742, 2019.
    [17] PORTELAS R, COLAS C, WENG Lilian, et al. Automatic curriculum learning for deep RL: A short survey[C]. Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, Yokohama, Japan, 2020: 4819–4825.
    [18] LANCTOT M, ZAMBALDI V, GRUSLYS A, et al. A unified game-theoretic approach to multiagent reinforcement learning[C]. Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, USA, 2017: 4193–4206.
    [19] BAKER B, KANITSCHEIDER I, MARKOV T, et al. Emergent tool use from multi-agent autocurricula[C]. 8th International Conference on Learning Representations, Addis Ababa, Ethiopia, 2020.
    [20] SILVER D, SCHRITTWIESER J, SIMONYAN K, et al. Mastering the game of go without human knowledge[J]. Nature, 2017, 550(7676): 354–359. doi: 10.1038/nature24270
    [21] JADERBERG M, CZARNECKI W M, DUNNING I, et al. Human-level performance in 3D multiplayer games with population-based reinforcement learning[J]. Science, 2019, 364(6443): 859–865. doi: 10.1126/science.aau6249
    [22] XU Zhongwen, VAN HASSELT H, and SILVER D. Meta-gradient reinforcement learning[C]. Proceedings of the 32nd International Conference on Neural Information Processing Systems, Montréal, Canada, 2018: 2402–2413.
    [23] XU Zhongwen, VAN HASSELT H P, HESSEL M, et al. Meta-gradient reinforcement learning with an objective discovered online[C]. Proceedings of the 34th International Conference on Neural Information Processing Systems, Vancouver, Canada, 2020: 1279.
    [24] FINN C, ABBEEL P, and LEVINE S. Model-agnostic meta-learning for fast adaptation of deep networks[C]. Proceedings of the 34th International Conference on Machine Learning, Sydney, Australia, 2017: 1126–1135.
    [25] FENG Xidong, SLUMBERS O, WAN Ziyu, et al. Neural auto-curricula in two-player zero-sum games[C/OL]. Proceedings of the 34th International Conference on Neural Information Processing Systems, 2021, 3504–3517.
    [26] HOLLAND J H. Genetic algorithms[J]. Scientific American, 1992, 267(1): 66–72. doi: 10.1038/scientificamerican0792-66
    [27] SEHGAL A, LA Hung, LOUIS S, et al. Deep reinforcement learning using genetic algorithm for parameter optimization[C]. 2019 Third IEEE International Conference on Robotic Computing (IRC), Naples, Italy, 2019: 596–601.
    [28] LIU Haoqiang, ZONG Zefang, LI Yong, et al. NeuroCrossover: An intelligent genetic locus selection scheme for genetic algorithm using reinforcement learning[J]. Applied Soft Computing, 2023, 146: 110680. doi: 10.1016/j.asoc.2023.110680
    [29] SONG Yanjie, WEI Luona, YANG Qing, et al. RL-GA: A reinforcement learning-based genetic algorithm for electromagnetic detection satellite scheduling problem[J]. Swarm and Evolutionary Computation, 2023, 77: 101236. doi: 10.1016/j.swevo.2023.101236
    [30] CETC-TFAI. MaCA[EB/OL].https://github.com/CETC-TFAI/MaCA, 2023.
    [31] MULLER P, OMIDSHAFIEI S, ROWLAND M, et al. A generalized training approach for multiagent learning[C]. 8th International Conference on Learning Representations, Addis Ababa, Ethiopia, 2020: 1–35.
    [32] TUYLS K, PEROLAT J, LANCTOT M, et al. A generalised method for empirical game theoretic analysis[C]. 17th International Conference on Autonomous Agents and MultiAgent Systems, Stockholm, Sweden, 2018: 77–85.
    [33] OH J, HESSEL M, CZARNECKI W M, et al. Discovering reinforcement learning algorithms[C]. Proceedings of the 34th International Conference on Neural Information Processing Systems, Vancouver, Canada, 2020: 1060–1070.
    [34] BALDUZZI D, GARNELO M, BACHRACH Y, et al. Open-ended learning in symmetric zero-sum games[C]. Proceedings of the 36th International Conference on Machine Learning, Long Beach, USA, 2019: 434–443.
    [35] 罗俊仁, 张万鹏, 袁唯淋, 等. 面向多智能体博弈对抗的对手建模框架[J]. 系统仿真学报, 2022, 34(9): 1941–1955. doi: 10.16182/j.issn1004731x.joss.21-0363

    LUO Junren, ZHANG Wanpeng, YUAN Weilin, et al. Research on opponent modeling framework for multi-agent game confrontation[J]. Journal of System Simulation, 2022, 34(9): 1941–1955. doi: 10.16182/j.issn1004731x.joss.21-0363
    [36] CHUNG J, GULCEHRE C, CHO K, et al. Empirical evaluation of gated recurrent neural networks on sequence modeling[EB/OL].https://arxiv.org/abs/1412.3555, 2014.
    [37] BERGER U. Brown's original fictitious play[J]. Journal of Economic Theory, 2007, 135(1): 572–578. doi: 10.1016/j.jet.2005.12.010
    [38] LONG J, SHELHAMER E, and DARRELL T. Fully convolutional networks for semantic segmentation[C]. Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition, Boston, USA, 2015: 3431–3440.
  • 加载中
图(12) / 表(3)
计量
  • 文章访问数:  672
  • HTML全文浏览量:  354
  • PDF下载量:  122
  • 被引次数: 0
出版历程
  • 收稿日期:  2023-08-04
  • 修回日期:  2023-10-09
  • 网络出版日期:  2023-10-14
  • 刊出日期:  2023-11-28

目录

    /

    返回文章
    返回