Automatic Generation of General Electromagnetic Countermeasures under an Unknown Game Paradigm

WANG Qing; CHEN Qi; WANG Haozhi; ZHANG Feng; DONG Zhicheng

doi:10.11999/JEIT230848

Volume 45 Issue 11

Nov. 2023

Turn off MathJax

Article Contents

Article Navigation > Journal of Electronics & Information Technology > 2023 > 45(11): 4072-4082

WANG Qing, CHEN Qi, WANG Haozhi, ZHANG Feng, DONG Zhicheng. Automatic Generation of General Electromagnetic Countermeasures under an Unknown Game Paradigm[J]. Journal of Electronics & Information Technology, 2023, 45(11): 4072-4082. doi: 10.11999/JEIT230848

Citation:

WANG Qing, CHEN Qi, WANG Haozhi, ZHANG Feng, DONG Zhicheng. Automatic Generation of General Electromagnetic Countermeasures under an Unknown Game Paradigm[J]. Journal of Electronics & Information Technology, 2023, 45(11): 4072-4082. doi: 10.11999/JEIT230848

Citation:

PDF( 2802 KB)

Automatic Generation of General Electromagnetic Countermeasures under an Unknown Game Paradigm

doi: 10.11999/JEIT230848 cstr: 32379.14.JEIT230848

1.
School of Electrical and Information Engineering, Tianjin University, Tianjin 300072, China
2.
Artificial Intelligence Institute of CETC, Beijing 100049, China
3.
School of Information Science and Technology, Xizang University, Lhasa 850000, China

Funds: The National Natural Science Foundation of China (61871282, U20A20162), The Key Research and Development Program of Xizang Autonomous Region, and the Science and Technology Major Project of Xizang Autonomous Region of China (XZ202201ZD0006G03)

Received Date: 2023-08-04
Rev Recd Date: 2023-10-09

Available Online: 2023-10-14

Publish Date: 2023-11-28

Abstract

Abstract

The electromagnetic space confrontation in electronic warfare is generally modeled as a zero-sum game. However, as the battlefield environment evolves, both sides of the game must adapt to new and unknown tasks, rendering manually designed game rules ineffective. A novel method called Population-based Multi-Agent Electromagnetic Countermeasure (PMAEC) is proposed to overcome the limitations of explicit game strategies. This approach enables the automatic generation of general electromagnetic countermeasure policies in unknown game paradigms. First, a meta-game framework is used to model the optimization problem of the electromagnetic countermeasure policy-population, which is decomposed into internal and external optimization based on electromagnetic game environments of the Multi-agent Combat Arena(MaCA) platform. Second, the meta-solver model is optimized by combining Auto-Curriculum Learning(ACL) with Meta-Learning technology. The PMAEC method involves iterative updates of the best response policy, expanding and strengthening the policy population to overcome the challenges leveraged by various difficult games. Simulation results of the MaCA platform demonstrate that the proposed method successfully bestows the meta-game with lower exploitability. Further, the trained population of electromagnetic countermeasure policies can be generalized to more complex zero-sum games. This approach extends the model from training based on simple scenarios to large-scale games in complex electromagnetic countermeasure environments, consequently enhancing the generalization capability of the strategies.
- Electromagnetic countermeasure,
- Meta-game,
- Auto-Curriculum Learning (ACL),
- Meta-learning

FullText(HTML)

References(38)

References

[1]	QUAN Siji, QIAN Weiping, GUQ J, et al. Radar-communication integration: An overview[C]. The 7th IEEE/International Conference on Advanced Infocomm Technology, Fuzhou, China, 2014: 98–103.
[2]	李光久, 李昕. 博弈论简明教程[M]. 镇江: 江苏大学出版社, 2013. LI Guangjiu and LI Xin. A Brief Tutorial on Game Theory[M]. Zhenjiang: Jiangsu University Press, 2013.
[3]	MUKHERJEE A and SWINDLEHURST A L. Jamming games in the MIMO wiretap channel with an active eavesdropper[J]. IEEE Transactions on Signal Processing, 2013, 61(1): 82–91. doi: 10.1109/TSP.2012.2222386
[4]	NGUYEN D N and KRUNZ M. Power minimization in MIMO cognitive networks using beamforming games[J]. IEEE Journal on Selected Areas in Communications, 2013, 31(5): 916–925. doi: 10.1109/JSAC.2013.130510
[5]	DELIGIANNIS A, PANOUI A, LAMBOTHARAN S, et al. Game-theoretic power allocation and the Nash equilibrium analysis for a multistatic MIMO radar network[J]. IEEE Transactions on Signal Processing, 2017, 65(24): 6397–6408. doi: 10.1109/TSP.2017.2755591
[6]	LIU Pengfei, WANG Lei, SHAN Zhao, et al. A dynamic game strategy for radar screening pulse width allocation against jamming using reinforcement learning[J]. IEEE Transactions on Aerospace and Electronic Systems, 2023, 59(5): 6059–6072.
[7]	LI Kang, JIU Bo, PU Wenqiang, et al. Neural fictitious self-play for radar antijamming dynamic game with imperfect information[J]. IEEE Transactions on Aerospace and Electronic Systems, 2022, 58(6): 5533–5547. doi: 10.1109/TAES.2022.3175186
[8]	ZHAO Chenyu, WANG Qing, LIU Xiaofeng, et al. Reinforcement learning based a non-zero-sum game for secure transmission against smart jamming[J]. Digital Signal Processing, 2021, 112: 103002. doi: 10.1016/j.dsp.2021.103002
[9]	LI Wen, CHEN Jin, LIU Xin, et al. Intelligent dynamic spectrum anti-jamming communications: a deep reinforcement learning perspective[J]. IEEE Wireless Communications, 2022, 29(5): 60–67. doi: 10.1109/MWC.103.2100365
[10]	LECUN Y, BENGIO Y, and HINTON G. Deep learning[J]. Nature, 2015, 521(7553): 436–444. doi: 10.1038/nature14539
[11]	ANDREW A M. Reinforcement learning: An introduction by Richard S. Sutton and Andrew G. Barto, Adaptive computation and machine learning series, MIT Press (Bradford book), Cambridge, Mass., 1998, xviii + 322 pp, ISBN 0-262-19398-1, (hardback, £31.95)[J].Robotica, 1999, 17(2): 229–235.
[12]	WANG Haozhi, WANG Qing, and CHEN Qi. Opponent’s dynamic prediction model-based power control scheme in secure transmission and smart jamming game[J]. IEEE Internet of Things Journal, To be published.
[13]	KRISHNAMURTHY V, ANGLEY D, EVANS R, et al. Identifying cognitive radars-inverse reinforcement learning using revealed preferences[J]. IEEE Transactions on Signal Processing, 2020, 68: 4529–4542. doi: 10.1109/TSP.2020.3013516
[14]	PATTANAYAK K, KRISHNAMURTHY V, and BERRY C. How can a cognitive radar mask its cognition?[C]. 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Singapore, 2022: 5897–5901.
[15]	LAURENT G J, MATIGNON L, LE FORT-PIAT N, et al. The world of independent learners is not Markovian[J]. International Journal of Knowledge-based and Intelligent Engineering Systems, 2011, 15(1): 55–64. doi: 10.3233/KES-2010-0206
[16]	LEIBO J Z, HUGHES E, LANCTOT M, et al. Autocurricula and the emergence of innovation from social interaction: A manifesto for multi-agent intelligence research[EB/OL].https://arxiv.org/abs/1903.00742, 2019.
[17]	PORTELAS R, COLAS C, WENG Lilian, et al. Automatic curriculum learning for deep RL: A short survey[C]. Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, Yokohama, Japan, 2020: 4819–4825.
[18]	LANCTOT M, ZAMBALDI V, GRUSLYS A, et al. A unified game-theoretic approach to multiagent reinforcement learning[C]. Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, USA, 2017: 4193–4206.
[19]	BAKER B, KANITSCHEIDER I, MARKOV T, et al. Emergent tool use from multi-agent autocurricula[C]. 8th International Conference on Learning Representations, Addis Ababa, Ethiopia, 2020.
[20]	SILVER D, SCHRITTWIESER J, SIMONYAN K, et al. Mastering the game of go without human knowledge[J]. Nature, 2017, 550(7676): 354–359. doi: 10.1038/nature24270
[21]	JADERBERG M, CZARNECKI W M, DUNNING I, et al. Human-level performance in 3D multiplayer games with population-based reinforcement learning[J]. Science, 2019, 364(6443): 859–865. doi: 10.1126/science.aau6249
[22]	XU Zhongwen, VAN HASSELT H, and SILVER D. Meta-gradient reinforcement learning[C]. Proceedings of the 32nd International Conference on Neural Information Processing Systems, Montréal, Canada, 2018: 2402–2413.
[23]	XU Zhongwen, VAN HASSELT H P, HESSEL M, et al. Meta-gradient reinforcement learning with an objective discovered online[C]. Proceedings of the 34th International Conference on Neural Information Processing Systems, Vancouver, Canada, 2020: 1279.
[24]	FINN C, ABBEEL P, and LEVINE S. Model-agnostic meta-learning for fast adaptation of deep networks[C]. Proceedings of the 34th International Conference on Machine Learning, Sydney, Australia, 2017: 1126–1135.
[25]	FENG Xidong, SLUMBERS O, WAN Ziyu, et al. Neural auto-curricula in two-player zero-sum games[C/OL]. Proceedings of the 34th International Conference on Neural Information Processing Systems, 2021, 3504–3517.
[26]	HOLLAND J H. Genetic algorithms[J]. Scientific American, 1992, 267(1): 66–72. doi: 10.1038/scientificamerican0792-66
[27]	SEHGAL A, LA Hung, LOUIS S, et al. Deep reinforcement learning using genetic algorithm for parameter optimization[C]. 2019 Third IEEE International Conference on Robotic Computing (IRC), Naples, Italy, 2019: 596–601.
[28]	LIU Haoqiang, ZONG Zefang, LI Yong, et al. NeuroCrossover: An intelligent genetic locus selection scheme for genetic algorithm using reinforcement learning[J]. Applied Soft Computing, 2023, 146: 110680. doi: 10.1016/j.asoc.2023.110680
[29]	SONG Yanjie, WEI Luona, YANG Qing, et al. RL-GA: A reinforcement learning-based genetic algorithm for electromagnetic detection satellite scheduling problem[J]. Swarm and Evolutionary Computation, 2023, 77: 101236. doi: 10.1016/j.swevo.2023.101236
[30]	CETC-TFAI. MaCA[EB/OL].https://github.com/CETC-TFAI/MaCA, 2023.
[31]	MULLER P, OMIDSHAFIEI S, ROWLAND M, et al. A generalized training approach for multiagent learning[C]. 8th International Conference on Learning Representations, Addis Ababa, Ethiopia, 2020: 1–35.
[32]	TUYLS K, PEROLAT J, LANCTOT M, et al. A generalised method for empirical game theoretic analysis[C]. 17th International Conference on Autonomous Agents and MultiAgent Systems, Stockholm, Sweden, 2018: 77–85.
[33]	OH J, HESSEL M, CZARNECKI W M, et al. Discovering reinforcement learning algorithms[C]. Proceedings of the 34th International Conference on Neural Information Processing Systems, Vancouver, Canada, 2020: 1060–1070.
[34]	BALDUZZI D, GARNELO M, BACHRACH Y, et al. Open-ended learning in symmetric zero-sum games[C]. Proceedings of the 36th International Conference on Machine Learning, Long Beach, USA, 2019: 434–443.
[35]	罗俊仁, 张万鹏, 袁唯淋, 等. 面向多智能体博弈对抗的对手建模框架[J]. 系统仿真学报, 2022, 34(9): 1941–1955. doi: 10.16182/j.issn1004731x.joss.21-0363 LUO Junren, ZHANG Wanpeng, YUAN Weilin, et al. Research on opponent modeling framework for multi-agent game confrontation[J]. Journal of System Simulation, 2022, 34(9): 1941–1955. doi: 10.16182/j.issn1004731x.joss.21-0363
[36]	CHUNG J, GULCEHRE C, CHO K, et al. Empirical evaluation of gated recurrent neural networks on sequence modeling[EB/OL].https://arxiv.org/abs/1412.3555, 2014.
[37]	BERGER U. Brown's original fictitious play[J]. Journal of Economic Theory, 2007, 135(1): 572–578. doi: 10.1016/j.jet.2005.12.010
[38]	LONG J, SHELHAMER E, and DARRELL T. Fully convolutional networks for semantic segmentation[C]. Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition, Boston, USA, 2015: 3431–3440.