融合多属性决策和深度Q值网络的反导火力分配方法

谢俊伟; 方峰; 彭冬亮; 任金磊; 王昌平

doi:10.11999/JEIT211136

融合多属性决策和深度Q值网络的反导火力分配方法

doi: 10.11999/JEIT211136 cstr: 32379.14.JEIT211136

1.
杭州电子科技大学自动化学院杭州 310018
2.
中国运载火箭技术研究院北京 100076

基金项目: 国家自然科学基金 (61673146)，浙江省属高校科研基金(GK209907299001-021)

详细信息

作者简介:
谢俊伟：男，博士生，研究方向为智能决策与控制

方峰：男，讲师，博士，研究方向为飞行器协同制导与控制、智能决策

彭冬亮：男，教授，博士，博士生导师，研究方向为信息融合、检测与估计

任金磊：男，工程师，硕士，研究方向为飞行器设计、弹道导航制导控制、智能控制

王昌平：男，硕士生，研究方向为导弹协同制导

通讯作者:
方峰　fangf@hdu.edu.cn

中图分类号: TP183; TJ761.7
计量
- 文章访问数: 1877
- HTML全文浏览量: 972
- PDF下载量: 172
- 被引次数: 0
出版历程
- 收稿日期: 2021-10-15
- 修回日期: 2022-01-10
- 录用日期: 2022-01-14
- 网络出版日期: 2022-02-02
- 刊出日期: 2022-11-14

Weapon-Target Assignment Optimization Based on Multi-attribute Decision-making and Deep Q-Network for Missile Defense System

1.
School of Automation, Hangzhou Dianzi University, Hangzhou 310018, China
2.
China Academy of Launch Vehicle Technology, Beijing 100076, China

Funds: The National Natural Science Foundation of China(61673146), Zhejiang Provincial University Research Foundation (GK209907299001-021)

摘要

摘要: 针对中大规模武器-目标分配(WTA)决策空间复杂度高、求解效率低的问题，该文提出一种基于多属性决策和深度Q网络(DQN)的WTA优化方法。建立基于层次分析法(AHP)的导弹威胁评估模型，引入熵值法表征目标属性差异，提升威胁评估客观性。根据最大毁伤概率准则，建立基于DQN的WTA分段决策模型，引入经验池均匀采样策略，确保各类目标分配经验的等概率抽取；设计综合局部和全局收益的奖励函数，兼顾DQN火力分配模型的训练效率和决策准确性。仿真结果表明，相较于传统启发式方法，该方法具备在线快速求解大规模WTA问题的优势，且对于WTA场景要素变化具有较好的鲁棒性。
- 火力分配 /
- 深度Q网络 /
- 威胁评估 /
- 改进层次分析法
Abstract: In a large-scale Weapon-Target Assignment (WTA) problem, the explored solution space becomes enormous due to the curse of dimensionality, and it causes low-efficiency in searching optimization solution. For solving this problem effectively, a WTA optimization approach based on multi-attribute decision-making and Deep Q-Network (DQN) is proposed. Firstly, a threat-assessment model for attacking missiles is built based on the approach of Analytic Hierarchy Process (AHP). Meanwhile, an entropy method, used for evaluating the differences of target attributes, is introduced, to increase objective in computing threat-assessment results. Then, an assignment criterion of maximum intercept probability is designed based on assess results, and a multi-steps WTA decision model is built in DQN frame. A uniform experience sampling strategy is designed, making sure that each target type of assignment experience has the same probability to be selected. Furthermore, for balancing the DQN convergence speed and global optimum, a reward function that combines local and global rewards is designed. Lastly, simulation results shows that the proposed WTA approach has the advantage in solving large-scale WTA problem fast and effectively, compared with the general heuristic approach. Also, it presents the robust performance for WTA scenario elements variation.
- Weapon Target Assignment(WTA) /
- Deep Q Network(DQN) /
- Threat assessment /
- Improved Analytic Hierarchy Process(AHP)

HTML全文

图 1 改进AHP法框架示意图

下载: 全尺寸图片幻灯片

图 2 基于DQN的WTA决策模型

下载: 全尺寸图片幻灯片

图 3 固定场景下DQN训练效果

下载: 全尺寸图片幻灯片

图 4 固定场景下DQN火力分配方案

下载: 全尺寸图片幻灯片

图 5 固定场景下仅考虑全局收益的DQN训练效果

下载: 全尺寸图片幻灯片

图 6 1000次蒙特卡罗仿真训练

下载: 全尺寸图片幻灯片

表 1 目标属性值

编号	攻击地重要度	剩余飞行时间(s)	最大高度 (km)	关机点速度(km/s)	RCS (m²)
1	4	220	260	2.3	0.007
2	9	250	225	2.1	0.005
3	4	530	630	4.2	0.012
4	2	550	680	4.8	0.013
5	6	240	235	2.2	0.010
6	2	610	710	5.1	0.015
7	1	1200	1600	6.8	0.017
8	0	1120	1450	6.6	0.016
9	2	1400	75	7.4	0.006
10	3	1500	78	7.1	0.007

下载: 导出CSV

表 2 传统和改进AHP方法的评估指标权重计算结果对比

	攻击地重要度	剩余飞行时间(s)	最大高度 (km)	关机点速度(km/s)	RCS (m²)
传统AHP	0.34	0.27	0.08	0.12	0.19
改进AHP	0.44	0.17	0.16	0.13	0.10

下载: 导出CSV

表 3 改进AHP与传统AHP法的目标威胁度评估结果

	编号
	8	7	9	4	6
改进AHP法	0.125	0.119	0.111	0.107	0.106
传统AHP法	0.115	0.110	0.104	0.107	0.106
	编号
	10	3	1	5	2
改进AHP法	0.104	0.095	0.091	0.078	0.060
传统AHP法	0.099	0.097	0.097	0.088	0.075

下载: 导出CSV

表 4 测试用例参数

测试用例编号	目标数量比	拦截弹数量比
#1	5:5:3:2	12:8:5
#2	10:8:5:2	18:15:12
#3	12:9:9:5	25:15:10

下载: 导出CSV

表 5 3种场景测试结果

指标	测试用例编号	分配方案求解方法
指标	测试用例编号	DQN	PSO	随机法
整体毁伤概率	#1	0.921	0.982	0.620
	#2	0.918	0.907	0.590
	#3	0.856	0.758	0.540
运行时间(s)	#1	0.050	22.001	0.001
	#2	0.170	62.021	0.003
	#3	0.220	137.000	0.019

下载: 导出CSV

参考文献(16)

[1]	KLINE A, AHNER D, and HILL R. The weapon-target assignment problem[J]. Computers & Operations Research, 2019, 105: 226–236. doi: 10.1016/j.cor.2018.10.015
[2]	YUE Jiao and ZHANG Ke. Vulnerability Threat assessment based on AHP and fuzzy comprehensive evaluation[C]. 2014 IEEE Seventh International Symposium on Computational Intelligence and Design, Hangzhou, China, 2014: 513–516.
[3]	杨罗章, 胡生亮, 冯士民. 基于Entropy-TOPSIS方法的目标威胁动态评估与仿真[J]. 兵工自动化, 2020, 39(3): 53–56,60. doi: 10.7690/bgzdh.2020.03.012 YANG Luozhang, HU Shengliang, and FENG Shimin. Dynamic evaluation and simulation of targets threat based on entropy and TOPSIS method[J]. Ordnance Industry Automation, 2020, 39(3): 53–56,60. doi: 10.7690/bgzdh.2020.03.012
[4]	陈龙, 马亚平. 基于分层贝叶斯网络的航母编队对潜威胁评估[J]. 系统仿真学报, 2017, 29(9): 2206–2212,2220. doi: 10.16182/j.issn1004731x.joss.201709044 CHEN Long and MA Yaping. Threat assessment of aircraft carrier formation based on hierarchical Bayesian network[J]. Journal of System Simulation, 2017, 29(9): 2206–2212,2220. doi: 10.16182/j.issn1004731x.joss.201709044
[5]	杨爱武, 李战武, 徐安, 等. 基于RS-CRITIC的空战目标威胁评估[J]. 北京航空航天大学学报, 2020, 46(12): 2357–2365. doi: 10.13700/j.bh.1001-5965.2019.0638 YANG Aiwu, LI Zhanwu, XU An, et al. Threat assessment of air combat target based on RS-CRITIC[J]. Journal of Beijing University of Aeronautics and Astronautics, 2020, 46(12): 2357–2365. doi: 10.13700/j.bh.1001-5965.2019.0638
[6]	LLOYD S P and WITSENHAUSE H S. Weapon allocation is NP-Complete[C]. The IEEE Summer Simulation Conference, Reno, USA, 1986: 1054–1058.
[7]	王邑, 孙金标, 肖明清, 等. 基于类型2区间模糊K近邻分类器的动态武器目标分配方法研究[J]. 系统工程与电子技术, 2016, 38(6): 1314–1319. doi: 10.3969/j.issn.1001-506X.2016.06.15 WANG Yi, SUN Jinbiao, XIAO Mingqing, et al. Research of dynamic weapon-target assignment problem based on type-2 interval fuzzy K-nearest neighbors classifier[J]. Systems Engineering and Electronics, 2016, 38(6): 1314–1319. doi: 10.3969/j.issn.1001-506X.2016.06.15
[8]	王净, 战凯, 晏峰. 基于动态规划算法的舰空导弹火力分配模型研究[J]. 舰船电子工程, 2011, 31(2): 24–26. doi: 10.3969/j.issn.1627-9730.2011.02.007 WANG Jing, ZHAN Kai, and YAN Feng. Ship-to-air missile firepower-distributing model study based on dynamic programming algorithm[J]. Ship Electronic Engineering, 2011, 31(2): 24–26. doi: 10.3969/j.issn.1627-9730.2011.02.007
[9]	丁立超, 黄枫, 潘伟. 基于改进混沌遗传算法的炮兵火力分配方法[J]. 系统仿真技术, 2021, 17(1): 12–16. doi: 10.16812/j.cnki.cn31-1945.2021.01.003 DING Lichao, HUANG Feng, and PAN Wei. Artillery fire allocation method based on improved chaotic genetic algorithm[J]. System Simulation Technology, 2021, 17(1): 12–16. doi: 10.16812/j.cnki.cn31-1945.2021.01.003
[10]	李俨, 董玉娜. 基于SA-DPSO混合优化算法的协同空战火力分配[J]. 航空学报, 2010, 31(3): 626–631. LI Yan and DONG Yu’na. Weapon-target assignment based on simulated annealing and discrete particle swarm optimization in cooperative air combat[J]. Acta Aeronautica et Astronautica Sinica, 2010, 31(3): 626–631.
[11]	SILVER D, SCHRITTWIESER J, SIMONYAN K, et al. Mastering the game of Go without human knowledge[J]. Nature, 2017, 550(7676): 354–359. doi: 10.1038/nature24270
[12]	ZHU Yuke, MOTTAGHI R, KOLVE E, et al. Target-driven visual navigation in indoor scenes using deep reinforcement learning[C]. 2017 IEEE International Conference on Robotics and Automation (ICRA), Singapore, 2017: 3357–3364.
[13]	施伟, 冯旸赫, 程光权, 等. 基于深度强化学习的多机协同空战方法研究[J]. 自动化学报, 2021, 47(7): 1610–1623. doi: 10.16383/j.aas.c201059 SHI Wei, FENG Yanghe, CHENG Guangquan, et al. Research on multi-aircraft cooperative air combat method based on deep reinforcement learning[J]. Acta Automatica Sinica, 2021, 47(7): 1610–1623. doi: 10.16383/j.aas.c201059
[14]	阎栋, 苏航, 朱军. 基于DQN的反舰导弹火力分配方法研究[J]. 导航定位与授时, 2019, 6(5): 18–24. doi: 10.19306/j.cnki.2095-8110.2019.05.003 YAN Dong, SU Hang, and ZHU Jun. Research on fire distribution method of anti-ship missile based on DQN[J]. Navigation Positioning and Timing, 2019, 6(5): 18–24. doi: 10.19306/j.cnki.2095-8110.2019.05.003
[15]	ZHU Yuxin, TIAN Dazuo, and YAN Feng. Effectiveness of entropy weight method in decision-making[J]. Mathematical Problems in Engineering, 2020, 2020: 3564835. doi: 10.1155/2020/3564835
[16]	MNIH V, KAVUKCUOGLU K, SILVER D, et al. Human-level control through deep reinforcement learning[J]. Nature, 2015, 518(7540): 529–533. doi: 10.1038/nature14236