高级搜索

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

一种通信对抗干扰资源分配智能决策算法

许华 宋佰霖 蒋磊 饶宁 史蕴豪

许华, 宋佰霖, 蒋磊, 饶宁, 史蕴豪. 一种通信对抗干扰资源分配智能决策算法[J]. 电子与信息学报, 2021, 43(11): 3086-3095. doi: 10.11999/JEIT210115
引用本文: 许华, 宋佰霖, 蒋磊, 饶宁, 史蕴豪. 一种通信对抗干扰资源分配智能决策算法[J]. 电子与信息学报, 2021, 43(11): 3086-3095. doi: 10.11999/JEIT210115
Hua XU, Bailin SONG, Lei JIANG, Ning RAO, Yunhao SHI. An Intelligent Decision-making Algorithm for Communication Countermeasure Jamming Resource Allocation[J]. Journal of Electronics & Information Technology, 2021, 43(11): 3086-3095. doi: 10.11999/JEIT210115
Citation: Hua XU, Bailin SONG, Lei JIANG, Ning RAO, Yunhao SHI. An Intelligent Decision-making Algorithm for Communication Countermeasure Jamming Resource Allocation[J]. Journal of Electronics & Information Technology, 2021, 43(11): 3086-3095. doi: 10.11999/JEIT210115

一种通信对抗干扰资源分配智能决策算法

doi: 10.11999/JEIT210115
详细信息
    作者简介:

    许华:男,1976年生,教授,博士生导师,研究方向为通信信号处理、智能通信对抗

    宋佰霖:男,1997年生,硕士生,研究方向为通信对抗智能决策

    蒋磊:男,1974年生,副教授,研究方向为通信抗干扰、智能通信对抗

    饶宁:男,1997年生,硕士生,研究方向为通信对抗智能决策

    史蕴豪:男,1996年生,博士生,研究方向为通信信号识别

    通讯作者:

    宋佰霖 songbail@126.com

  • 中图分类号: TN975

An Intelligent Decision-making Algorithm for Communication Countermeasure Jamming Resource Allocation

  • 摘要: 针对战场通信对抗智能决策问题,该文基于整体对抗思想提出一种基于自举专家轨迹分层强化学习的干扰资源分配决策算法(BHJM),算法针对跳频干扰决策难题,按照频点分布划分干扰频段,再基于分层强化学习模型分级决策干扰频段和干扰带宽,最后利用基于自举专家轨迹的经验回放机制采样并训练优化算法,使算法能够在现有干扰资源特别是干扰资源不足的条件下,优先干扰最具威胁目标,获得最优干扰效果同时减少总的干扰带宽。仿真结果表明,算法较现有资源分配决策算法节约25%干扰站资源,减少15%干扰带宽,具有较大实用价值。
  • 图  1  典型干扰场景

    图  2  200~400 MHz频率分布

    图  3  算法流程结构

    图  4  基于自举专家轨迹的经验回放机制

    图  5  目标频率集分布情况

    图  6  不同数量干扰站的干扰效果

    图  7  干扰效果对比

    图  8  干扰带宽对比

    图  9  决策效果对比

    表  1  目标属性

    目标属性威胁系数
    ${N_1}$通信网16
    ${N_2}$通信网25
    ${N_3}$通信网34
    ${N_4}$通信网43
    ${N_5}$通信网52
    ${N_6}$通信网61
    下载: 导出CSV

    表  2  干扰资源分配算法

     算法1 基于整体对抗思想的干扰资源分配算法
     (1)按照威胁系数设置对目标的干扰顺序$[{T_1},{T_2},\cdots,{T_M}]$, ${T_1}$为第1目标,${T_M}$为最末目标;
     (2)按照干扰机最大干扰带宽${B_{\max }}$将所有目标频点按划分为多个子频段$[{J_{ {S_1} } },{J_{ {S_2} } },\cdots,{J_{ {S_Y} } }]$;
     Repeat:
     (3)给干扰机${J_1}$选择干扰频段${J_{{S_1}}}$;
     (4)根据频段${J_{{S_1}}}$设置干扰带宽${B_1}$,找出在带宽范围内包含${T_1}$频点数量最多的频率集,该带宽范围即为拦阻干扰带,${{\boldsymbol{P}}_1}{\rm{ = }}[{J_{{S_1}}},{B_1}]$为干扰
       策略$\pi $的一部分;
     (5)重复步骤3和4,直至所有目标被完全阻断或干扰资源全部用完,此时共生成$K$个子干扰策略,干扰策略$\pi {\rm{ = } }[{P_1},{P_2},\cdots,{P_K}]$。
     Break.
    下载: 导出CSV

    表  3  BETMR算法

     算法2 BETMR算法
     (1)建立经验池${ {\boldsymbol{E} }_{ {\bf{normal} } } }$和${ {\boldsymbol{E} }_{ {\bf{expert} } } }$,初始化为空集;
     (2)设置初始阈值$\delta $,$\delta {\rm{ = }}{\delta _0}$;
     Repeat:
     (3)将样本$e$存入${ {\boldsymbol{E} }_{ {\bf{normal} } } }$中;
     (4)当回合结束:
     Break
     (5)判断该回合样本是否满足专家轨迹条件,
     若满足:将样本$e$存入${ {\boldsymbol{E} }_{ {\bf{expert} } } }$中;
     若不满足:pass
     (6)按式(6)计算下一回合$\delta $,若$\delta $改变,重置${ {\boldsymbol{E} }_{ {\bf{expert} } } }$为空集;
     (7)按式(7)抽取样本${\boldsymbol{E}}$。
    下载: 导出CSV

    表  4  BHJM算法

     算法3 BHJM算法
     (1)初始化干扰频段、带宽决策器,分别建立两个神经网络:权值参数为θ的估值神经网络和权值参数为θ-的目标神经网络;
     Repeat:
     While $j < W$:
     (2)目标生成器获取环境状态${S_0}$,给出干扰目标$g$;
     (3)干扰频段决策器获取环境状态${S_1}$,决策干扰频段即干扰动作${A_1}$;
     (4)干扰频段决策器获取环境状态${S_2}$,决策干扰带宽即干扰动作${A_2}$;
     (5)效果评估器计算奖励值${r_1}$和${r_2}$;
     (6)两层决策器分别按照BETMR机制存储样本${{\boldsymbol{E}}_1}$和${{\boldsymbol{E}}_2}$;
     (7)获取下一步环境状态${S_1}'$和${S_2}'$;
     (8)当完成干扰任务或干扰资源用尽:
     Break
     (9)当前回合结束后,两层决策器分别按式(14)更新各自的估值神经网络;
     (10)每$L$个回合后,按式(16)、式(17)分别更新两层决策器的目标神经网络;
     (11)当算法训练至最优后,循环结束。
    下载: 导出CSV

    表  5  侦察目标信息

    目标威胁系数通信距离(km)通信发射功率(W)干扰距离(km)
    ${N_1}$62010030
    ${N_2}$52010050
    ${N_3}$45010070
    ${N_4}$35010090
    ${N_5}$220100110
    ${N_6}$120100130
    下载: 导出CSV
  • [1] XIAO Liang, LIU Jinliang, LI Qiangda, et al. User-centric view of jamming games in cognitive radio networks[J]. IEEE Transactions on Information Forensics and Security, 2015, 10(12): 2578–2590. doi: 10.1109/TIFS.2015.2467593
    [2] AMURU S D and BUEHRER R M. Optimal jamming against digital modulation[J]. IEEE Transactions on Information Forensics and Security, 2015, 10(10): 2212–2224. doi: 10.1109/TIFS.2015.2451081
    [3] 王沙飞, 鲍雁飞, 李岩. 认知电子战体系结构与技术[J]. 中国科学: 信息科学, 2018, 48(12): 1603–1613. doi: 10.1360/N112018-00153

    WANG Shafei, BAO Yanfei, and LI Yan. The architecture and technology of cognitive electronic warfare[J]. Science in China:Information Sciences, 2018, 48(12): 1603–1613. doi: 10.1360/N112018-00153
    [4] VINYALS O, BABUSCHKIN I, CZARNECKI W M, et al. Grandmaster level in StarCraft II using multi-agent reinforcement learning[J]. Nature, 2019, 575(7782): 350–354. doi: 10.1038/s41586-019-1724-z
    [5] QIAO Zhiqian, TYREE Z, MUDALIGE P, et al. Hierarchical reinforcement learning method for autonomous vehicle behavior planning[J]. arXiv preprint arXiv: 1911.03799, 2019.
    [6] BELLO I, PHAM H, LE Q V, et al. Neural combinatorial optimization with reinforcement learning[C]. The International Conference on Learning Representations, Toulon, France, 2017.
    [7] NAPARSTEK O and COHEN K. Deep multi-user reinforcement learning for distributed dynamic spectrum access[J]. IEEE Transactions on Wireless Communications, 2019, 18(1): 310–323. doi: 10.1109/TWC.2018.2879433
    [8] AMURU S D, TEKIN C, VAN DER SCHAAR M, et al. Jamming bandits—A novel learning method for optimal jamming[J]. IEEE Transactions on Wireless Communications, 2016, 15(4): 2792–2808. doi: 10.1109/TWC.2015.2510643
    [9] AMURU S and BUEHRER R M. Optimal jamming using delayed learning[C]. 2014 IEEE Military Communications Conference, Baltimore, USA, 2014: 1528–1533.
    [10] 颛孙少帅, 杨俊安, 刘辉, 等. 采用双层强化学习的干扰决策算法[J]. 西安交通大学学报, 2018, 52(2): 63–69. doi: 10.7652/xjtuxb201802010

    ZHUANSUN Shaoshuai, YANG Jun’an, LIU Hui, et al. An algorithm for jamming decision using dual reinforcement learning[J]. Journal of Xi'an Jiaotong University, 2018, 52(2): 63–69. doi: 10.7652/xjtuxb201802010
    [11] 颛孙少帅, 杨俊安, 刘辉, 等. 基于正强化学习和正交分解的干扰策略选择算法[J]. 系统工程与电子技术, 2018, 40(3): 518–525. doi: 10.3969/j.issn.1001-506X.2018.03.05

    ZHUANSUN Shaoshuai, YANG Jun’an, LIU Hui, et al. Jamming strategy learning based on positive reinforcement learning and orthogonal decomposition[J]. Systems Engineering and Electronics, 2018, 40(3): 518–525. doi: 10.3969/j.issn.1001-506X.2018.03.05
    [12] LI Yangyang, XU Yuhua, XU Yitao, et al. Dynamic spectrum anti-jamming in broadband communications: A hierarchical deep reinforcement learning approach[J]. IEEE Wireless Communications Letters, 2020, 9(10): 1616–1619. doi: 10.1109/LWC.2020.2999333
    [13] KULKARNI T D, NARASIMHAN K R, SAEEDI A, et al. Hierarchical deep reinforcement learning: Integrating temporal abstraction and intrinsic motivation[C]. The 30th Conference on Neural Information Processing Systems, Barcelona, Spain, 2016: 3675–3683.
    [14] RAFATI J and NOELLE D C. Learning representations in model-free hierarchical reinforcement learning[C]. The AAAI Conference on Artificial Intelligence, Palo Alto, USA, 2019: 10009–10010.
    [15] FESTA P. A brief introduction to exact, approximation, and heuristic algorithms for solving hard combinatorial optimization problems[C]. 2014 16th International Conference on Transparent Optical Networks, Graz, Austria, 2014: 1–20.
    [16] GULCEHRE C, LE PAINE T, SHAHRIARI B, et al. Making efficient use of demonstrations to solve hard exploration problems[C]. The International Conference on Learning Representations, 2020.
    [17] MNIH V, KAVUKCUOGLU K, SILVER D, et al. Human-level control through deep reinforcement learning[J]. Nature, 2015, 518(7540): 529–533. doi: 10.1038/nature14236
  • 加载中
图(9) / 表(5)
计量
  • 文章访问数:  1631
  • HTML全文浏览量:  1002
  • PDF下载量:  285
  • 被引次数: 0
出版历程
  • 收稿日期:  2021-02-01
  • 修回日期:  2021-05-26
  • 网络出版日期:  2021-06-11
  • 刊出日期:  2021-11-23

目录

    /

    返回文章
    返回