未知博弈范式下的通用电磁对抗策略自动生成研究

汪清; 陈琪; 王浩智; 张峰; 董志诚

doi:10.11999/JEIT230848

未知博弈范式下的通用电磁对抗策略自动生成研究

doi: 10.11999/JEIT230848

汪清¹,
陈琪¹,
王浩智¹,
张峰²,
董志诚^3, ,

1.
天津大学电气自动化与信息工程学院天津 300072
2.
中国电科智能科技研究院北京 100049
3.
西藏大学信息科学技术学院拉萨 850000

基金项目: 国家自然科学基金(61871282, U20A20162)，西藏自治区重点研发计划，平安西藏重大专项(XZ202201ZD0006G03)

详细信息

作者简介:
汪清：女，教授，研究方向为新体制雷达、高速高可靠通信、智能信息处理等

陈琪：女，硕士生，研究方向为智能信号处理、强化学习

王浩智：男，博士生，研究方向为智能信号处理、强化学习

张峰：男，高级工程师，研究方向为智能信息处理

董志诚：男，教授，研究方向为智能信息处理、高移动性无线通信等

通讯作者:
董志诚　 dongzc666@163.com

中图分类号: TN794
计量
- 文章访问数: 754
- HTML全文浏览量: 418
- PDF下载量: 129
- 被引次数: 0
出版历程
- 收稿日期: 2023-08-04
- 修回日期: 2023-10-09
- 网络出版日期: 2023-10-14
- 刊出日期: 2023-11-28

Automatic Generation of General Electromagnetic Countermeasures under an Unknown Game Paradigm

1.
School of Electrical and Information Engineering, Tianjin University, Tianjin 300072, China
2.
Artificial Intelligence Institute of CETC, Beijing 100049, China
3.
School of Information Science and Technology, Xizang University, Lhasa 850000, China

Funds: The National Natural Science Foundation of China (61871282, U20A20162), The Key Research and Development Program of Xizang Autonomous Region, and the Science and Technology Major Project of Xizang Autonomous Region of China (XZ202201ZD0006G03)

摘要

摘要: 电磁空间对抗通常被建模为零和博弈，但是当作战环境变化时零和博弈双方需要适应新的未知任务，人工设定的博弈规则不再适用。为了避免人为设计显式博弈策略，该文提出一种基于种群的多智能体电磁对抗方法(PMAEC)，以实现在未知博弈范式下的通用电磁对抗策略自动生成。首先，基于模拟电磁博弈对抗环境的多智能体对抗平台(MaCA)，采用元博弈框架建立电磁对抗策略种群优化问题模型，并将其分解为内部和外部优化目标。其次，结合元学习技术，通过自动课程学习(ACL)优化元求解器模型。最后，通过迭代更新最佳响应策略，扩充并强化策略种群以适应不同难度的博弈挑战。在MaCA平台上的仿真结果表明，所提的PMAEC方法能使元博弈收敛到更低的可利用度，并且训练得到的电磁对抗策略种群可以泛化到更复杂的零和博弈，实现模型从简单场景训练扩展至复杂电磁对抗环境的大规模博弈，增强电磁对抗策略的泛化能力。
- 电磁对抗 /
- 元博弈 /
- 自动课程学习 /
- 元学习
Abstract: The electromagnetic space confrontation in electronic warfare is generally modeled as a zero-sum game. However, as the battlefield environment evolves, both sides of the game must adapt to new and unknown tasks, rendering manually designed game rules ineffective. A novel method called Population-based Multi-Agent Electromagnetic Countermeasure (PMAEC) is proposed to overcome the limitations of explicit game strategies. This approach enables the automatic generation of general electromagnetic countermeasure policies in unknown game paradigms. First, a meta-game framework is used to model the optimization problem of the electromagnetic countermeasure policy-population, which is decomposed into internal and external optimization based on electromagnetic game environments of the Multi-agent Combat Arena(MaCA) platform. Second, the meta-solver model is optimized by combining Auto-Curriculum Learning(ACL) with Meta-Learning technology. The PMAEC method involves iterative updates of the best response policy, expanding and strengthening the policy population to overcome the challenges leveraged by various difficult games. Simulation results of the MaCA platform demonstrate that the proposed method successfully bestows the meta-game with lower exploitability. Further, the trained population of electromagnetic countermeasure policies can be generalized to more complex zero-sum games. This approach extends the model from training based on simple scenarios to large-scale games in complex electromagnetic countermeasure environments, consequently enhancing the generalization capability of the strategies.
- Electromagnetic countermeasure /
- Meta-game /
- Auto-Curriculum Learning (ACL) /
- Meta-learning

HTML全文

图 1 MaCA电磁对抗作战场景示意图

下载: 全尺寸图片幻灯片

图 2 PMAEC算法流程图

下载: 全尺寸图片幻灯片

图 3 基于GRU的元求解器结构

下载: 全尺寸图片幻灯片

图 4 异构地图对抗环境训练阶段可利用度变化

下载: 全尺寸图片幻灯片

图 5 同构地图对抗环境训练阶段可利用度变化

下载: 全尺寸图片幻灯片

图 6 异构地图在不同策略维度测试的最终可利用度

下载: 全尺寸图片幻灯片

图 7 同构地图在不同策略维度测试的最终可利用度

下载: 全尺寸图片幻灯片

图 8 不同维度下异构地图泛化到同构地图的最终可利用度

下载: 全尺寸图片幻灯片

图 9 异构地图反向传播窗口大小对最佳可利用度的影响

下载: 全尺寸图片幻灯片

图 10 同构地图反向传播窗口大小对最佳可利用度的影响

下载: 全尺寸图片幻灯片

图 11 异构地图环境下不同模型对可利用度的影响

下载: 全尺寸图片幻灯片

图 12 同构地图环境下不同模型对可利用度的影响

下载: 全尺寸图片幻灯片

算法1　基于种群的多智能体电磁对抗算法(PMAEC)
(1) 给定博弈分布 $p(\mathcal{G})$ ，学习率 $\eta$ 和 $\mu$ ，时间窗 $T$ ，初始化策略池　 ${\varPhi ^0}$ 和元求解器参数 $\theta$
(2) for每次训练迭代
(3) 　从 $p(\mathcal{G})$ 中采样 $K$ 个博弈
(4) 　　 for每个博弈
(5) 　　　 for时间窗内每个时刻
(6) 　　　　计算 $t - 1$ 时刻的元策略 ${\sigma ^{t - 1} } = {f_\theta }({ {\boldsymbol{M} }^{t - 1} })$
(7) 　　　　初始化最佳响应策略 ${\phi ^0}$
(8) 　　　　根据式(23)计算最佳响应 $\phi _{{\rm{BR}}}^t$
(9) 　　　　将 $\phi _{{\rm{BR}}}^t$ 添加至种群 ${\varPhi ^t} = {\varPhi ^{t - 1} } \cup \phi _{{\rm{BR}}}^t$
(10) 　　　 end for
(11) 　　　计算 $T$ 时刻的元策略 ${\sigma^T} = {f_\theta }({ {\boldsymbol{M} }^T})$
(12) 　　　根据式(1)计算可利用度 ${\rm{Exp} }\left( {\sigma _\theta ^T,\varPhi _\theta ^T} \right)$
(13) 　　 end for
(14) 　　根据式(10)、式(24) 、式(25)计算元梯度 ${\nabla _\theta }\mathcal{J}_{{\rm{out}}}^k(\theta )$
(15) 　　根据式(27)更新元求解器参数 $\theta$
(16) end for
(17) 储存当前元求解器模型 ${f_\theta }$

下载: 导出CSV

表 1 PMAEC算法参数列表

参数名称	数值
外部学习率 $\mu$	0.01
元训练迭代次数	100
元批尺寸	5
模型类型	GRU
窗口大小	[1, 3, 5]
内部学习率 $\eta$	25.0
内部梯度迭代次数	5
可利用度学习率	10.0
内部可利用度迭代次数	20

下载: 导出CSV

表 2 可利用度均值和标准差对比

算法	异构地图		同构地图
算法	均值	标准差	均值	标准差
PMAEC	0.0017	0.0064	0.0186	0.0079
PSRO	0.0464	0.013	0.056	0.015
Self-Play	0.0465	0.0133	0.0558	0.0151

下载: 导出CSV

参考文献(38)

[1]	QUAN Siji, QIAN Weiping, GUQ J, et al. Radar-communication integration: An overview[C]. The 7th IEEE/International Conference on Advanced Infocomm Technology, Fuzhou, China, 2014: 98–103.
[2]	李光久, 李昕. 博弈论简明教程[M]. 镇江: 江苏大学出版社, 2013. LI Guangjiu and LI Xin. A Brief Tutorial on Game Theory[M]. Zhenjiang: Jiangsu University Press, 2013.
[3]	MUKHERJEE A and SWINDLEHURST A L. Jamming games in the MIMO wiretap channel with an active eavesdropper[J]. IEEE Transactions on Signal Processing, 2013, 61(1): 82–91. doi: 10.1109/TSP.2012.2222386
[4]	NGUYEN D N and KRUNZ M. Power minimization in MIMO cognitive networks using beamforming games[J]. IEEE Journal on Selected Areas in Communications, 2013, 31(5): 916–925. doi: 10.1109/JSAC.2013.130510
[5]	DELIGIANNIS A, PANOUI A, LAMBOTHARAN S, et al. Game-theoretic power allocation and the Nash equilibrium analysis for a multistatic MIMO radar network[J]. IEEE Transactions on Signal Processing, 2017, 65(24): 6397–6408. doi: 10.1109/TSP.2017.2755591
[6]	LIU Pengfei, WANG Lei, SHAN Zhao, et al. A dynamic game strategy for radar screening pulse width allocation against jamming using reinforcement learning[J]. IEEE Transactions on Aerospace and Electronic Systems, 2023, 59(5): 6059–6072.
[7]	LI Kang, JIU Bo, PU Wenqiang, et al. Neural fictitious self-play for radar antijamming dynamic game with imperfect information[J]. IEEE Transactions on Aerospace and Electronic Systems, 2022, 58(6): 5533–5547. doi: 10.1109/TAES.2022.3175186
[8]	ZHAO Chenyu, WANG Qing, LIU Xiaofeng, et al. Reinforcement learning based a non-zero-sum game for secure transmission against smart jamming[J]. Digital Signal Processing, 2021, 112: 103002. doi: 10.1016/j.dsp.2021.103002
[9]	LI Wen, CHEN Jin, LIU Xin, et al. Intelligent dynamic spectrum anti-jamming communications: a deep reinforcement learning perspective[J]. IEEE Wireless Communications, 2022, 29(5): 60–67. doi: 10.1109/MWC.103.2100365
[10]	LECUN Y, BENGIO Y, and HINTON G. Deep learning[J]. Nature, 2015, 521(7553): 436–444. doi: 10.1038/nature14539
[11]	ANDREW A M. Reinforcement learning: An introduction by Richard S. Sutton and Andrew G. Barto, Adaptive computation and machine learning series, MIT Press (Bradford book), Cambridge, Mass., 1998, xviii + 322 pp, ISBN 0-262-19398-1, (hardback, £31.95)[J].Robotica, 1999, 17(2): 229–235.
[12]	WANG Haozhi, WANG Qing, and CHEN Qi. Opponent’s dynamic prediction model-based power control scheme in secure transmission and smart jamming game[J]. IEEE Internet of Things Journal, To be published.
[13]	KRISHNAMURTHY V, ANGLEY D, EVANS R, et al. Identifying cognitive radars-inverse reinforcement learning using revealed preferences[J]. IEEE Transactions on Signal Processing, 2020, 68: 4529–4542. doi: 10.1109/TSP.2020.3013516
[14]	PATTANAYAK K, KRISHNAMURTHY V, and BERRY C. How can a cognitive radar mask its cognition?[C]. 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Singapore, 2022: 5897–5901.
[15]	LAURENT G J, MATIGNON L, LE FORT-PIAT N, et al. The world of independent learners is not Markovian[J]. International Journal of Knowledge-based and Intelligent Engineering Systems, 2011, 15(1): 55–64. doi: 10.3233/KES-2010-0206
[16]	LEIBO J Z, HUGHES E, LANCTOT M, et al. Autocurricula and the emergence of innovation from social interaction: A manifesto for multi-agent intelligence research[EB/OL].https://arxiv.org/abs/1903.00742, 2019.
[17]	PORTELAS R, COLAS C, WENG Lilian, et al. Automatic curriculum learning for deep RL: A short survey[C]. Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, Yokohama, Japan, 2020: 4819–4825.
[18]	LANCTOT M, ZAMBALDI V, GRUSLYS A, et al. A unified game-theoretic approach to multiagent reinforcement learning[C]. Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, USA, 2017: 4193–4206.
[19]	BAKER B, KANITSCHEIDER I, MARKOV T, et al. Emergent tool use from multi-agent autocurricula[C]. 8th International Conference on Learning Representations, Addis Ababa, Ethiopia, 2020.
[20]	SILVER D, SCHRITTWIESER J, SIMONYAN K, et al. Mastering the game of go without human knowledge[J]. Nature, 2017, 550(7676): 354–359. doi: 10.1038/nature24270
[21]	JADERBERG M, CZARNECKI W M, DUNNING I, et al. Human-level performance in 3D multiplayer games with population-based reinforcement learning[J]. Science, 2019, 364(6443): 859–865. doi: 10.1126/science.aau6249
[22]	XU Zhongwen, VAN HASSELT H, and SILVER D. Meta-gradient reinforcement learning[C]. Proceedings of the 32nd International Conference on Neural Information Processing Systems, Montréal, Canada, 2018: 2402–2413.
[23]	XU Zhongwen, VAN HASSELT H P, HESSEL M, et al. Meta-gradient reinforcement learning with an objective discovered online[C]. Proceedings of the 34th International Conference on Neural Information Processing Systems, Vancouver, Canada, 2020: 1279.
[24]	FINN C, ABBEEL P, and LEVINE S. Model-agnostic meta-learning for fast adaptation of deep networks[C]. Proceedings of the 34th International Conference on Machine Learning, Sydney, Australia, 2017: 1126–1135.
[25]	FENG Xidong, SLUMBERS O, WAN Ziyu, et al. Neural auto-curricula in two-player zero-sum games[C/OL]. Proceedings of the 34th International Conference on Neural Information Processing Systems, 2021, 3504–3517.
[26]	HOLLAND J H. Genetic algorithms[J]. Scientific American, 1992, 267(1): 66–72. doi: 10.1038/scientificamerican0792-66
[27]	SEHGAL A, LA Hung, LOUIS S, et al. Deep reinforcement learning using genetic algorithm for parameter optimization[C]. 2019 Third IEEE International Conference on Robotic Computing (IRC), Naples, Italy, 2019: 596–601.
[28]	LIU Haoqiang, ZONG Zefang, LI Yong, et al. NeuroCrossover: An intelligent genetic locus selection scheme for genetic algorithm using reinforcement learning[J]. Applied Soft Computing, 2023, 146: 110680. doi: 10.1016/j.asoc.2023.110680
[29]	SONG Yanjie, WEI Luona, YANG Qing, et al. RL-GA: A reinforcement learning-based genetic algorithm for electromagnetic detection satellite scheduling problem[J]. Swarm and Evolutionary Computation, 2023, 77: 101236. doi: 10.1016/j.swevo.2023.101236
[30]	CETC-TFAI. MaCA[EB/OL].https://github.com/CETC-TFAI/MaCA, 2023.
[31]	MULLER P, OMIDSHAFIEI S, ROWLAND M, et al. A generalized training approach for multiagent learning[C]. 8th International Conference on Learning Representations, Addis Ababa, Ethiopia, 2020: 1–35.
[32]	TUYLS K, PEROLAT J, LANCTOT M, et al. A generalised method for empirical game theoretic analysis[C]. 17th International Conference on Autonomous Agents and MultiAgent Systems, Stockholm, Sweden, 2018: 77–85.
[33]	OH J, HESSEL M, CZARNECKI W M, et al. Discovering reinforcement learning algorithms[C]. Proceedings of the 34th International Conference on Neural Information Processing Systems, Vancouver, Canada, 2020: 1060–1070.
[34]	BALDUZZI D, GARNELO M, BACHRACH Y, et al. Open-ended learning in symmetric zero-sum games[C]. Proceedings of the 36th International Conference on Machine Learning, Long Beach, USA, 2019: 434–443.
[35]	罗俊仁, 张万鹏, 袁唯淋, 等. 面向多智能体博弈对抗的对手建模框架[J]. 系统仿真学报, 2022, 34(9): 1941–1955. doi: 10.16182/j.issn1004731x.joss.21-0363 LUO Junren, ZHANG Wanpeng, YUAN Weilin, et al. Research on opponent modeling framework for multi-agent game confrontation[J]. Journal of System Simulation, 2022, 34(9): 1941–1955. doi: 10.16182/j.issn1004731x.joss.21-0363
[36]	CHUNG J, GULCEHRE C, CHO K, et al. Empirical evaluation of gated recurrent neural networks on sequence modeling[EB/OL].https://arxiv.org/abs/1412.3555, 2014.
[37]	BERGER U. Brown's original fictitious play[J]. Journal of Economic Theory, 2007, 135(1): 572–578. doi: 10.1016/j.jet.2005.12.010
[38]	LONG J, SHELHAMER E, and DARRELL T. Fully convolutional networks for semantic segmentation[C]. Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition, Boston, USA, 2015: 3431–3440.