高级搜索

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

基于强化学习的频控阵-多输入多输出雷达发射功率分配方法

丁梓航 谢军伟 齐铖

丁梓航, 谢军伟, 齐铖. 基于强化学习的频控阵-多输入多输出雷达发射功率分配方法[J]. 电子与信息学报, 2023, 45(2): 550-557. doi: 10.11999/JEIT211555
引用本文: 丁梓航, 谢军伟, 齐铖. 基于强化学习的频控阵-多输入多输出雷达发射功率分配方法[J]. 电子与信息学报, 2023, 45(2): 550-557. doi: 10.11999/JEIT211555
DING Zihang, XIE Junwei, QI Cheng. Transmit Power Allocation Method of Frequency Diverse Array-Multi Input and Multi Output Radar Based on Reinforcement Learning[J]. Journal of Electronics & Information Technology, 2023, 45(2): 550-557. doi: 10.11999/JEIT211555
Citation: DING Zihang, XIE Junwei, QI Cheng. Transmit Power Allocation Method of Frequency Diverse Array-Multi Input and Multi Output Radar Based on Reinforcement Learning[J]. Journal of Electronics & Information Technology, 2023, 45(2): 550-557. doi: 10.11999/JEIT211555

基于强化学习的频控阵-多输入多输出雷达发射功率分配方法

doi: 10.11999/JEIT211555
详细信息
    作者简介:

    丁梓航:男,博士生,研究方向为频控阵阵列优化设计

    谢军伟:男,教授,研究方向为雷达干扰与抗干扰技术

    齐铖:男,硕士生,研究方向为雷达资源管理与阵列优化设计

    通讯作者:

    丁梓航 dingzihang0831@163.com

  • 中图分类号: TN958.5

Transmit Power Allocation Method of Frequency Diverse Array-Multi Input and Multi Output Radar Based on Reinforcement Learning

  • 摘要: 当前电磁环境日益复杂多变,新式干扰手段层出不穷,对雷达系统带来了极大的挑战和威胁。该文引入频谱干扰模型并提出了一种在频控阵-多输入多输出(FDA-MIMO)雷达与干扰机动态博弈框架下基于强化学习(RL)的发射功率分配优化方法,使雷达系统能够获得最大的信干噪比(SINR)。在此基础上,构造了频谱干扰模型。其次,雷达和干扰机之间存在一种Stackelberg博弈关系,且将雷达作为领导者,干扰机作为跟随者,建立动态博弈框架下的发射功率分配优化模型。采用深度确定性策略梯度(DDPG)算法,结合功率约束设计了奖赏函数,对雷达发射功率进行实时分配来获得最大的输出SINR。最后,仿真结果表明,在雷达与干扰机博弈的框架下,所提优化算法能够有效地对雷达发射功率进行优化,使雷达具备较好的抗干扰性能。
  • 图  1  发射功率优化方法整体框架

    图  2  累计奖赏值和SINR随回合数的变化情况

    图  3  发射功率分配情况

    图  4  SINR值变化情况

    图  5  干扰信号在频率-角度的功率分布情况

    图  6  计算复杂度随发射阵元数目变化情况

     算法1 DDPG算法
     随机初始化评论家网络$ Q\left(·|{\theta }^{Q}\right) $和演员网络$ \mu \left(·|{\theta }^{\mu }\right) $的网络参
     数$ {\theta ^Q} $, $ {\theta ^\mu } $
     初始化目标评论家和演员网络的参数$ {\theta ^{Q'}} \leftarrow {\theta ^Q} $, $ {\theta ^{\mu '}} \leftarrow {\theta ^\mu } $
     初始化回放记忆池$ B $
     FOR 回合数=1:L do
       在动作探索策略中初始化随机过程$ \mathcal{O} $
       接收初始观测状态${{\boldsymbol{s}}_1}$
       FOR t=1:T do
         根据当前策略和随机噪声选择动作${a_t} = \mu \left( { {{\boldsymbol{s}}_t}|{\theta ^\mu } } \right) + {\mathcal{O}_t}$
         执行动作$ {a_t} $并且获得奖赏值$ {r_t} $,得到新状态${{\boldsymbol{s}}_{t + 1} }$,
         保存传递样本组合$\left( { {{\boldsymbol{s}}_t},{{\boldsymbol{a}}_t},{r_t},{{\boldsymbol{s}}_{t + 1} } } \right)$到回放记忆池$ B $
         从回放记忆池$ B $中随机采样生成H维数据库
         $\left( { {{\boldsymbol{s}}_t},{{\boldsymbol{a}}_t},{r_t},{{\boldsymbol{s}}_{t + 1} } } \right)$
         根据评论家网络$ Q\left(·|{\theta }^{Q}\right) $,计算目标值
         ${y_i} = {r_i} + \varepsilon Q'\left( { {{\boldsymbol{s}}_{i + 1} },\mu '\left( { {{\boldsymbol{s}}_{i + 1} }|{\theta ^\mu } } \right)|{\theta ^Q} } \right)$
         通过最小化损失函数更新评论家网络:
         $\dfrac{1}{H}\displaystyle\sum\limits_{i = 1}^H { { {\left( { {y_i} - Q\left( { {{\boldsymbol{s}}_i},{{\boldsymbol{a}}_i}|{\theta ^Q} } \right)} \right)}^2} }$
         计算评论家网络的策略梯度:
         ${ {\text{∇} } _{\boldsymbol{a}}}Q\left( {{\boldsymbol{s}},{\boldsymbol{a}}|{\theta ^Q} } \right){|_{a = \mu \left( { {{\boldsymbol{s}}_{i + 1} }|{\theta ^\mu } } \right),{\boldsymbol{s}} = {{\boldsymbol{s}}_j} } }$
         使用样本的策略梯度更新演员网络参数$ {\theta ^\mu } $:
     $\dfrac{1}{H}\displaystyle\sum\limits_{i = 1}^H { { {\text{∇} } _a} } Q\left( { {\boldsymbol{s} },{\boldsymbol{a} }|{\theta ^Q} } \right){|_{a = \mu \left( { {{\boldsymbol{s}}_{i + 1} }|{\theta ^\mu } } \right),{\boldsymbol{s} } = { {\boldsymbol{s} }_i} } } \cdot { {\text{∇} }_{ {\theta ^\mu } } }\mu \left( { {\boldsymbol{s} }|{\theta ^\mu } } \right){|_{ {\boldsymbol{s} } = { {\boldsymbol{s} }_i} } }$
         评论家和演员目标网络参数更新:
           $ {\theta ^{Q'}} \leftarrow \tau {\theta ^Q} + \left( {1 - \tau } \right){\theta ^{Q'}} $,
           $ {\theta ^{\mu '}} \leftarrow \tau {\theta ^\mu } + \left( {1 - \tau } \right){\theta ^{\mu '}} $,
        其中,$ \tau $($ 0 < \tau < 1 $)为参数更新速率
       END FOR
     END FOR
    下载: 导出CSV

    表  1  频谱干扰信号在工作时间段内的参数变化情况

    参数${\text{1} } \le t \le 10$${\text{11} } \le t \le {\text{15} }$${\text{16} } \le t \le {\text{2} }0$
    干扰功率 (dB)30, 20, 3020, 30, 3030, 25, 25
    干扰频谱索引1, 4, 62, 3, 51, 4, 6
    干扰角度 (°)45, 45, 4645, 47, 4645, 44, 45
    下载: 导出CSV

    表  2  算法复杂度

    所提算法内点法[20]
    计算复杂度$\mathcal{O}\left( { {N_{ {\text{input} } } }{N_1} + {N_1}{N_2} + {N_2}{N_{ {\text{output} } } }} \right)$$\mathcal{O}\left( { { {\left( { {N_{ {\text{input} } } } } \right)}^{3.5} }\lg \left( {1/\varepsilon } \right)} \right)$
    下载: 导出CSV
  • [1] ANTONIK P, WICKS M C, GRIFFITHS H D, et al. Frequency diverse array radars[C]. The 2006 IEEE Conference on Radar, Verona, Italy, 2006: 215–217.
    [2] WANG Wenqing. Overview of frequency diverse array in radar and navigation applications[J]. IET Radar, Sonar & Navigation, 2016, 10(6): 1001–1012. doi: 10.1049/iet-rsn.2015.0464
    [3] WANG Wenqing and SHAO Huaizong. Range-angle localization of targets by a double-pulse frequency diverse array radar[J]. IEEE Journal of Selected Topics in Signal Processing, 2014, 8(1): 106–114. doi: 10.1109/JSTSP.2013.2285528
    [4] DING Zihang, XIE Junwei, WANG Bo, et al. Robust adaptive null broadening method based on FDA-MIMO radar[J]. IEEE Access, 2020, 8: 177976–177983. doi: 10.1109/ACCESS.2020.3025602
    [5] XU Jingwei, LIAO Guisheng, ZHU Shengqi, et al. Joint range and angle estimation using MIMO radar with frequency diverse array[J]. IEEE Transactions on Signal Processing, 2015, 63(13): 3396–3410. doi: 10.1109/TSP.2015.2422680
    [6] WANG Bo, XIE Junwei, ZHANG Jing, et al. Dot-shaped beamforming analysis of subarray-based sin-FDA[J]. Frontiers of Information Technology & Electronic Engineering, 2019, 20(10): 1429–1444. doi: 10.1631/FITEE.1800722
    [7] XIONG Jie, WANG Wenqing, SHAO Huaizong, et al. Frequency diverse array transmit beampattern optimization with genetic algorithm[J]. IEEE Antennas Wireless Propagation Letters, 2016, 16: 469–472. doi: 10.1109/LAWP.2016.2584078
    [8] SAMMARTINO P F, BAKER C J, and GRIGGITHS H D. Frequency diverse MIMO techniques for radar[J]. IEEE Transactions on Aerospace and Electronic Systems, 2013, 49(1): 201–222. doi: 10.1109/TAES.2013.6404099
    [9] XU Jingwei, LIAO Guisheng, ZHU Shengqi, et al. Deceptive jamming suppression with frequency diverse MIMO radar[J]. Signal Processing, 2015, 113: 9–17. doi: 10.1016/j.sigpro.2015.01.014
    [10] LAN Lan, XU Jingwei, LIAO Guisheng, et al. Suppression of mainbeam deceptive jammer with FDA-MIMO radar[J]. IEEE Transactions on Vehicular Technology, 2020, 69(10): 11584–11598. doi: 10.1109/TVT.2020.3014689
    [11] LAN Lan, LIAO Guisheng, XU Jingwei, et al. Transceive beamforming with accurate nulling in FDA-MIMO radar for imaging[J]. IEEE Transactions on Geoscience and Remote Sensing, 2020, 58(6): 4145–4159. doi: 10.1109/TGRS.2019.2961324
    [12] XU Jingwei, ZHU Shengqi, and LIAO Guisheng. Space-time-range adaptive processing for airborne radar systems[J]. IEEE Sensors Journal, 2015, 15(3): 1602–1610. doi: 10.1109/JSEN.2014.2364594
    [13] XU Jingwei, LIAO Guisheng, HUANG Lei, et al. Robust adaptive beamforming for fast-moving target detection with FDA-STAP radar[J]. IEEE Transactions on Signal Processing, 2017, 65(4): 973–984. doi: 10.1109/TSP.2016.2628340
    [14] 赫彬, 苏洪涛. 认知雷达抗干扰中的博弈论分析综述[J]. 电子与信息学报, 2021, 43(5): 1199–1211. doi: 10.11999/JEIT200843

    HE Bin and SU Hongtao. A review of game theory analysis in cognitive radar anti-jamming[J]. Journal of Electronics &Information Technology, 2021, 43(5): 1199–1211. doi: 10.11999/JEIT200843
    [15] 吴家乐, 时晨光, 周建江. 博弈论在雷达系统中的应用研究综述[J]. 飞航导弹, 2021(9): 59–66.

    WU Jiale, SHI Chenguang, and ZHOU Jianjiang. A review of game theory application in radar system[J]. Aerodynamic Missile Journal, 2021(9): 59–66.
    [16] SONG Xiufeng, WILLETT P, ZHOU Shengli, et al. The MIMO radar and jammer games[J]. IEEE Transactions on Signal Process, 2012, 6(2): 687–699. doi: 10.1109/TSP.2011.2169251
    [17] DELIGIANNIS A, PANOUI A, LAMBOTHARAN S, et al. Game-theoretic power allocation and the NASH equilibrium analysis for a multistatic MIMO radar network[J]. IEEE Transactions on Signal Processing, 2017, 65(24): 6397–6408. doi: 10.1109/TSP.2017.2755591
    [18] GODRICH H, PETROPULU A P, and POOR H V. Power allocation strategies for target localization in distributed multiple-radar architectures[J]. IEEE Transactions on Signal Processing, 2011, 59(7): 3226–3240. doi: 10.1109/TSP.2011.2144976
    [19] DING Zihang and XIE Junwei. Joint transmit and receive beamforming for cognitive FDA-MIMO radar with moving target[J]. IEEE Sensors Journal, 2021, 21(18): 20878–20885. doi: 10.1109/JSEN.2021.3100332
    [20] LUO Zhiquan, MA W K, SO A M C, et al. Semidefinite relaxation of quadratic optimization problems[J]. IEEE Signal Processing Magazine, 2010, 27(3): 20–34. doi: 10.1109/MSP.2010.936019
  • 加载中
图(6) / 表(3)
计量
  • 文章访问数:  513
  • HTML全文浏览量:  315
  • PDF下载量:  133
  • 被引次数: 0
出版历程
  • 收稿日期:  2021-12-22
  • 修回日期:  2022-02-24
  • 录用日期:  2022-03-03
  • 网络出版日期:  2022-03-07
  • 刊出日期:  2023-02-07

目录

    /

    返回文章
    返回