一种通信对抗干扰资源分配智能决策算法

许华; 宋佰霖; 蒋磊; 饶宁; 史蕴豪

doi:10.11999/JEIT210115

一种通信对抗干扰资源分配智能决策算法

doi: 10.11999/JEIT210115 cstr: 32379.14.JEIT210115

空军工程大学信息与导航学院西安 710077

详细信息

作者简介:
许华：男，1976年生，教授，博士生导师，研究方向为通信信号处理、智能通信对抗

宋佰霖：男，1997年生，硕士生，研究方向为通信对抗智能决策

蒋磊：男，1974年生，副教授，研究方向为通信抗干扰、智能通信对抗

饶宁：男，1997年生，硕士生，研究方向为通信对抗智能决策

史蕴豪：男，1996年生，博士生，研究方向为通信信号识别

通讯作者:
宋佰霖　songbail@126.com

中图分类号: TN975
计量
- 文章访问数: 2176
- HTML全文浏览量: 1321
- PDF下载量: 308
- 被引次数: 0
出版历程
- 收稿日期: 2021-02-01
- 修回日期: 2021-05-26
- 网络出版日期: 2021-06-11
- 刊出日期: 2021-11-23

An Intelligent Decision-making Algorithm for Communication Countermeasure Jamming Resource Allocation

Information and Navigation College, Air Force Engineering University, Xi’an 710077, China

摘要

摘要: 针对战场通信对抗智能决策问题，该文基于整体对抗思想提出一种基于自举专家轨迹分层强化学习的干扰资源分配决策算法(BHJM)，算法针对跳频干扰决策难题，按照频点分布划分干扰频段，再基于分层强化学习模型分级决策干扰频段和干扰带宽，最后利用基于自举专家轨迹的经验回放机制采样并训练优化算法，使算法能够在现有干扰资源特别是干扰资源不足的条件下，优先干扰最具威胁目标，获得最优干扰效果同时减少总的干扰带宽。仿真结果表明，算法较现有资源分配决策算法节约25%干扰站资源，减少15%干扰带宽，具有较大实用价值。
- 智能干扰决策 /
- 分层强化学习 /
- 干扰资源分配 /
- 专家轨迹
Abstract: Considering the intelligent decision of battlefield communication countermeasure, based on the overall confrontation, a Bootstrapped expert trajectory memory replay - Hierarchical reinforcement learning - Jamming resources distribution decision - Making algorithm(BHJM) is proposed, and the algorithm for frequency hopping jamming decision problem, according to the frequency distribution, jamming spectrum is divided, based on hierarchical reinforcement learning again decision jamming spectrum and bandwidth are divided, and finally based on the bootstrapped expert trajectory memory replay mechanism, the algorithm is optimized, the algorithm can is existing resources, especially under the condition of insufficient resources, give priority to jam the most threat target, obtain the optimal jamming effect and reduce the total jamming bandwidth. The simulation results show that, compared with the existing resource allocation decision algorithms, the proposed algorithm can save 25% of the resources of jammers and 15% of the jamming bandwidth, which is of great practical value.
- Intelligent interference decision /
- Hierarchical Reinforcement Learning(HRL) /
- Jamming resource allocation /
- Expert trajectory

HTML全文

图 1 典型干扰场景

下载: 全尺寸图片幻灯片

图 2 200～400 MHz频率分布

下载: 全尺寸图片幻灯片

图 3 算法流程结构

下载: 全尺寸图片幻灯片

图 4 基于自举专家轨迹的经验回放机制

下载: 全尺寸图片幻灯片

图 5 目标频率集分布情况

下载: 全尺寸图片幻灯片

图 6 不同数量干扰站的干扰效果

下载: 全尺寸图片幻灯片

图 7 干扰效果对比

下载: 全尺寸图片幻灯片

图 8 干扰带宽对比

下载: 全尺寸图片幻灯片

图 9 决策效果对比

下载: 全尺寸图片幻灯片

表 1 目标属性

目标	属性	威胁系数
${N_1}$	通信网1	6
${N_2}$	通信网2	5
${N_3}$	通信网3	4
${N_4}$	通信网4	3
${N_5}$	通信网5	2
${N_6}$	通信网6	1

下载: 导出CSV

表 2 干扰资源分配算法

算法1 基于整体对抗思想的干扰资源分配算法
(1)按照威胁系数设置对目标的干扰顺序$[{T_1},{T_2},\cdots,{T_M}]$, ${T_1}$为第1目标，${T_M}$为最末目标；
(2)按照干扰机最大干扰带宽${B_{\max }}$将所有目标频点按划分为多个子频段$[{J_{ {S_1} } },{J_{ {S_2} } },\cdots,{J_{ {S_Y} } }]$；
Repeat:
(3)给干扰机${J_1}$选择干扰频段${J_{{S_1}}}$；
(4)根据频段${J_{{S_1}}}$设置干扰带宽${B_1}$，找出在带宽范围内包含${T_1}$频点数量最多的频率集，该带宽范围即为拦阻干扰带，${{\boldsymbol{P}}_1}{\rm{ = }}[{J_{{S_1}}},{B_1}]$为干扰　　策略$\pi $的一部分；
(5)重复步骤3和4，直至所有目标被完全阻断或干扰资源全部用完，此时共生成$K$个子干扰策略，干扰策略$\pi {\rm{ = } }[{P_1},{P_2},\cdots,{P_K}]$。
Break.

下载: 导出CSV

表 3 BETMR算法

算法2 BETMR算法
(1)建立经验池${ {\boldsymbol{E} }_{ {\bf{normal} } } }$和${ {\boldsymbol{E} }_{ {\bf{expert} } } }$，初始化为空集；
(2)设置初始阈值$\delta $，$\delta {\rm{ = }}{\delta _0}$；
Repeat：
(3)将样本$e$存入${ {\boldsymbol{E} }_{ {\bf{normal} } } }$中；
(4)当回合结束：
Break；
(5)判断该回合样本是否满足专家轨迹条件，
若满足：将样本$e$存入${ {\boldsymbol{E} }_{ {\bf{expert} } } }$中；
若不满足：pass
(6)按式(6)计算下一回合$\delta $，若$\delta $改变，重置${ {\boldsymbol{E} }_{ {\bf{expert} } } }$为空集；
(7)按式(7)抽取样本${\boldsymbol{E}}$。

下载: 导出CSV

表 4 BHJM算法

算法3 BHJM算法
(1)初始化干扰频段、带宽决策器，分别建立两个神经网络：权值参数为θ的估值神经网络和权值参数为θ^-的目标神经网络；
Repeat:
While $j < W$：
(2)目标生成器获取环境状态${S_0}$，给出干扰目标$g$；
(3)干扰频段决策器获取环境状态${S_1}$，决策干扰频段即干扰动作${A_1}$；
(4)干扰频段决策器获取环境状态${S_2}$，决策干扰带宽即干扰动作${A_2}$；
(5)效果评估器计算奖励值${r_1}$和${r_2}$；
(6)两层决策器分别按照BETMR机制存储样本${{\boldsymbol{E}}_1}$和${{\boldsymbol{E}}_2}$；
(7)获取下一步环境状态${S_1}'$和${S_2}'$；
(8)当完成干扰任务或干扰资源用尽：
Break；
(9)当前回合结束后，两层决策器分别按式(14)更新各自的估值神经网络；
(10)每$L$个回合后，按式(16)、式(17)分别更新两层决策器的目标神经网络；
(11)当算法训练至最优后，循环结束。

下载: 导出CSV

表 5 侦察目标信息

目标	威胁系数	通信距离(km)	通信发射功率(W)	干扰距离(km)
${N_1}$	6	20	100	30
${N_2}$	5	20	100	50
${N_3}$	4	50	100	70
${N_4}$	3	50	100	90
${N_5}$	2	20	100	110
${N_6}$	1	20	100	130

下载: 导出CSV

参考文献(17)

[1]	XIAO Liang, LIU Jinliang, LI Qiangda, et al. User-centric view of jamming games in cognitive radio networks[J]. IEEE Transactions on Information Forensics and Security, 2015, 10(12): 2578–2590. doi: 10.1109/TIFS.2015.2467593
[2]	AMURU S D and BUEHRER R M. Optimal jamming against digital modulation[J]. IEEE Transactions on Information Forensics and Security, 2015, 10(10): 2212–2224. doi: 10.1109/TIFS.2015.2451081
[3]	王沙飞, 鲍雁飞, 李岩. 认知电子战体系结构与技术[J]. 中国科学: 信息科学, 2018, 48(12): 1603–1613. doi: 10.1360/N112018-00153 WANG Shafei, BAO Yanfei, and LI Yan. The architecture and technology of cognitive electronic warfare[J]. Science in China:Information Sciences, 2018, 48(12): 1603–1613. doi: 10.1360/N112018-00153
[4]	VINYALS O, BABUSCHKIN I, CZARNECKI W M, et al. Grandmaster level in StarCraft II using multi-agent reinforcement learning[J]. Nature, 2019, 575(7782): 350–354. doi: 10.1038/s41586-019-1724-z
[5]	QIAO Zhiqian, TYREE Z, MUDALIGE P, et al. Hierarchical reinforcement learning method for autonomous vehicle behavior planning[J]. arXiv preprint arXiv: 1911.03799, 2019.
[6]	BELLO I, PHAM H, LE Q V, et al. Neural combinatorial optimization with reinforcement learning[C]. The International Conference on Learning Representations, Toulon, France, 2017.
[7]	NAPARSTEK O and COHEN K. Deep multi-user reinforcement learning for distributed dynamic spectrum access[J]. IEEE Transactions on Wireless Communications, 2019, 18(1): 310–323. doi: 10.1109/TWC.2018.2879433
[8]	AMURU S D, TEKIN C, VAN DER SCHAAR M, et al. Jamming bandits—A novel learning method for optimal jamming[J]. IEEE Transactions on Wireless Communications, 2016, 15(4): 2792–2808. doi: 10.1109/TWC.2015.2510643
[9]	AMURU S and BUEHRER R M. Optimal jamming using delayed learning[C]. 2014 IEEE Military Communications Conference, Baltimore, USA, 2014: 1528–1533.
[10]	颛孙少帅, 杨俊安, 刘辉, 等. 采用双层强化学习的干扰决策算法[J]. 西安交通大学学报, 2018, 52(2): 63–69. doi: 10.7652/xjtuxb201802010 ZHUANSUN Shaoshuai, YANG Jun’an, LIU Hui, et al. An algorithm for jamming decision using dual reinforcement learning[J]. Journal of Xi'an Jiaotong University, 2018, 52(2): 63–69. doi: 10.7652/xjtuxb201802010
[11]	颛孙少帅, 杨俊安, 刘辉, 等. 基于正强化学习和正交分解的干扰策略选择算法[J]. 系统工程与电子技术, 2018, 40(3): 518–525. doi: 10.3969/j.issn.1001-506X.2018.03.05 ZHUANSUN Shaoshuai, YANG Jun’an, LIU Hui, et al. Jamming strategy learning based on positive reinforcement learning and orthogonal decomposition[J]. Systems Engineering and Electronics, 2018, 40(3): 518–525. doi: 10.3969/j.issn.1001-506X.2018.03.05
[12]	LI Yangyang, XU Yuhua, XU Yitao, et al. Dynamic spectrum anti-jamming in broadband communications: A hierarchical deep reinforcement learning approach[J]. IEEE Wireless Communications Letters, 2020, 9(10): 1616–1619. doi: 10.1109/LWC.2020.2999333
[13]	KULKARNI T D, NARASIMHAN K R, SAEEDI A, et al. Hierarchical deep reinforcement learning: Integrating temporal abstraction and intrinsic motivation[C]. The 30th Conference on Neural Information Processing Systems, Barcelona, Spain, 2016: 3675–3683.
[14]	RAFATI J and NOELLE D C. Learning representations in model-free hierarchical reinforcement learning[C]. The AAAI Conference on Artificial Intelligence, Palo Alto, USA, 2019: 10009–10010.
[15]	FESTA P. A brief introduction to exact, approximation, and heuristic algorithms for solving hard combinatorial optimization problems[C]. 2014 16th International Conference on Transparent Optical Networks, Graz, Austria, 2014: 1–20.
[16]	GULCEHRE C, LE PAINE T, SHAHRIARI B, et al. Making efficient use of demonstrations to solve hard exploration problems[C]. The International Conference on Learning Representations, 2020.
[17]	MNIH V, KAVUKCUOGLU K, SILVER D, et al. Human-level control through deep reinforcement learning[J]. Nature, 2015, 518(7540): 529–533. doi: 10.1038/nature14236