高级搜索

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

无人机通信多模抗干扰:融合二维迁移强化学习的协同决策方法

王诗雨 汪西明 可臻怡 刘典雄 刘继泽 杜智勇

王诗雨, 汪西明, 可臻怡, 刘典雄, 刘继泽, 杜智勇. 无人机通信多模抗干扰:融合二维迁移强化学习的协同决策方法[J]. 电子与信息学报. doi: 10.11999/JEIT250566
引用本文: 王诗雨, 汪西明, 可臻怡, 刘典雄, 刘继泽, 杜智勇. 无人机通信多模抗干扰:融合二维迁移强化学习的协同决策方法[J]. 电子与信息学报. doi: 10.11999/JEIT250566
WANG Shiyu, WANG Ximing, KE Zhenyi, LIU Dianxiong, LIU Jize, DU Zhiyong. Multi-Mode Anti-Jamming for UAV Communications: A Cooperative Mode-Based Decision-Making Approach via Two-Dimensional Transfer Reinforcement Learning[J]. Journal of Electronics & Information Technology. doi: 10.11999/JEIT250566
Citation: WANG Shiyu, WANG Ximing, KE Zhenyi, LIU Dianxiong, LIU Jize, DU Zhiyong. Multi-Mode Anti-Jamming for UAV Communications: A Cooperative Mode-Based Decision-Making Approach via Two-Dimensional Transfer Reinforcement Learning[J]. Journal of Electronics & Information Technology. doi: 10.11999/JEIT250566

无人机通信多模抗干扰:融合二维迁移强化学习的协同决策方法

doi: 10.11999/JEIT250566 cstr: 32379.14.JEIT250566
基金项目: 国家自然科学基金(62201581, 62471473)
详细信息
    作者简介:

    王诗雨:女,硕士生,研究方向为智能通信抗干扰

    汪西明:男,讲师,研究方向为智能通信抗干扰

    可臻怡:女,硕士生,研究方向为智能通信抗干扰

    刘典雄:男,高级工程师,研究方向为战术无线通信

    刘继泽:男,本科生,研究方向为协作机器学习、通信抗干扰

    杜智勇:男,副教授,研究方向为无人机通信、智能无线通信

    通讯作者:

    杜智勇 duzhiyong2010@gmail.com

  • 中图分类号: TN929.5;TP181

Multi-Mode Anti-Jamming for UAV Communications: A Cooperative Mode-Based Decision-Making Approach via Two-Dimensional Transfer Reinforcement Learning

Funds: The National Natural Science Foundation of China (62201581, 62471473)
  • 摘要: 针对无人机(UAV)在复杂电磁环境下通信易受干扰攻击的问题,该文提出一种多模式协同抗干扰架构。通过融合智能跳频(IFH)、基于干扰的反向散射通信(JBC)与能量采集(EH)技术,构建“规避-利用-转化”三位一体的防御体系,并设计二维迁移学习机制解决资源受限平台的实时决策难题。在任务维度建立模式间策略共享网络,提取决策共性特征并设计平行深度Q学习网络(DQN)进行策略学习,在抗干扰模式维度通过历史经验复用加速在线学习。仿真结果表明,该文所提方案较传统深度强化学习算法收敛速度提升64%,在动态干扰环境下通信中断概率始终低于20%。通过合理选择抗干扰模式与信道,系统在不同干扰模式下仍能维持高效通信,实现抗干扰性能与能耗的最优均衡。
  • 图  1  多模式抗干扰场景

    图  2  系统架构

    图  3  基于DQN的跨模式迁移学习机制

    图  4  基于DQN的跨任务迁移学习机制

    图  5  MT-DQN不同任务回报值收敛趋势

    图  6  干扰场景中不同DQN算法的收敛趋势

    图  7  动态干扰场景中不同DQN算法的收敛趋势

    1  基于二维迁移强化学习的UAV通信多模抗干扰算法

     初始化:生成卷积神经网络$Q_{\theta _t^{\rm C}}^{\rm C}$, $Q_{\theta _t^{\rm E}}^{\rm E}$,网络权重$\theta _t^{\rm C}$, $\theta _t^{\rm E}$随机赋值,生成回放缓冲区C,D,定义回放缓冲区大小$Batch$,学习率$\alpha $,折扣因子$\gamma $以及权重$a$, $b$;初始化环境状态为${s_0}$,初始动作为${a_0}$
     (1) Loop $t = 0,1, \cdots ,T$
     (2) 智能体执行动作${a_t}$并获得奖励值${r^{\rm C}}\left( {{s_t},{a_t}} \right)$, ${r^{\rm E}}\left( {{s_t},{a_t}} \right)$,环境状态转移到${s_{t + 1}}$
     (3) 智能体观测环境状态${s_{t + 1}}$,将经验数据$\left( {{s_t},{a_t},r_t^{\rm C},{s_{t + 1}}} \right)$, $\left( {{s_t},{a_t},r_t^{\rm E},{s_{t + 1}}} \right)$分别放入回放缓冲区C,D;
     (4) 根据经验数据$\left( {{s_t},{a_t},r_t^{\rm C},{s_{t + 1}}} \right)$得到另外2种模式的经验$\left( {{s_t},{a_{t'}},r_t^{{\mathrm{C}}'},{s_{t + 1}}} \right)$, $\left( {{s_t},{a_{t''}},r_t^{{\mathrm{C}}''},{s_{t + 1}}} \right)$并放入回放缓冲区C;
     (5) 根据经验数据$\left( {{s_t},{a_t},r_t^{\rm E},{s_{t + 1}}} \right)$得到另外两种模式的经验$\left( {{s_t},{a_{t'}},r_t^{{\mathrm{E}}'},{s_{t + 1}}} \right)$,$\left( {{s_t},{a_{t''}},r_t^{{\mathrm{E}}''},{s_{t + 1}}} \right)$并放入回放缓冲区D;
     (6) if 回放缓冲区C, D数据量$ \ge {\mathrm{Batch}}$
     (7) 智能体从回放缓冲区中随机抽取${\mathrm{Batch}}$个样本数据,循环每一组抽样数据:
      (1)将观测状态${s_t}$作为卷积神经网络$Q_{\theta _t^{\rm C}}^{\rm C}$, $Q_{\theta _t^{\rm E}}^{\rm E}$的输入,获得目标Q值${\mathrm{Target}}\:{Q^{\rm C}}$, ${\mathrm{Target}}{Q^{\rm E}}$;
      (2)将观测状态${s_{t + 1}}$作为卷积神经网络的输入,获得预测Q值$ Q_{\theta _t^{\rm C}}^{\rm C}\left( {{s_{t + 1}},{a_t}} \right) $, $Q_{\theta _t^{\rm E}}^{\rm E}\left( {{s_{t + 1}},{a_t}} \right)$;
      (3)${\mathrm{Target}}{Q^{\rm C}}{\mkern 1mu} {\mkern 1mu} = {\mkern 1mu} {\mkern 1mu} {\mathrm{Target}}{Q^{\rm C}} \cdot \left( {1 - \alpha } \right) + \alpha \cdot \left[ {{r^{\rm C}}\left( {{s_t},{a_t}} \right) + \gamma \mathop {\max }\limits_{a \in \mathcal{A}} Q_{\theta _t^{\rm C}}^{\rm C}\left( {{s_{t + 1}},a} \right)} \right]$
      ${\mathrm{Target}}{Q^{\rm E}}\: = \:{\mathrm{Target}}{Q^{\rm E}} \cdot \left( {1 - \alpha } \right) + \alpha \cdot \left[ {{r^{\rm E}}\left( {{s_t},{a_t}} \right) + \gamma \mathop {\max }\limits_{a \in \mathcal{A}} Q_{\theta _t^{\rm E}}^{\rm E}\left( {{s_{t + 1}},a} \right)} \right]$
     (8) 计算损失函数$L\left( {\theta _t^{\rm C}} \right)$, $L\left( {\theta _t^{\rm E}} \right)$并更新网络权重$\theta _t^{{\mathrm{C}}'}$, $\theta _t^{{\mathrm{E}}'}$;
     (9) 智能体将当前观测状态${s_{t = 1}}$作为网络$Q_{\theta _t^{{\mathrm{C}}'}}^{\rm C}$, $Q_{\theta _t^{\mathrm{{E}}'}}^{\rm E}$的输入,获得动作值函数$Q_{\theta _t^{{\mathrm{C}}'}}^{\rm C}\left( {{s_{t + 1}},a} \right)$, $Q_{\theta _t^{{\mathrm{E}}'}}^{\rm E}\left( {{s_{t + 1}},a} \right)$;
     (10) 智能体基于复合动作值函数的加权平均值,利用纯贪婪策略选择动作${a_{t + 1}} = arg\mathop {\max }\limits_{a \in \mathcal{A}} \left[ {a \cdot Q_{\theta _t^{{\mathrm{C}}'}}^{\rm C}\left( {{s_{t + 1}},a} \right) + b \cdot Q_{\theta _t^{{\mathrm{E}}'}}^{\rm E}\left( {{s_{t + 1}},a} \right)} \right]$;
     (11) 更新${s_t} = {s_{t = 1}}$, ${a_t} = {a_{t = 1}}$;
     (12) End Loop
    下载: 导出CSV

    表  1  仿真参数设置

    参数 参数值
    频谱感知历史时长 $L = 100{\text{ }}{\mathrm{ms}}$
    训练批次数量 ${\mathrm{Batch}} = 128$
    回放缓冲区容量 $D = 5\;000$
    学习率 $\alpha = 0.5$
    折扣因子 $\gamma = 0.9$
    路径损耗因子 $\mu = 2$
    下载: 导出CSV

    表  2  奖励参数设置

    符号 参数含义 参数值
    ${C^{\rm H}}/{C^{{\mathrm{H}}'}}$ IFH模式信道容量奖励/惩罚(通信成功/失败) 2/–1
    ${C^{\rm B}}/{C^{{\mathrm{B}}'}}$ JBC模式信道容量奖励/惩罚(通信成功/失败) 1/–1
    ${C^{\mathrm{E}}}$ EH模式信道容量惩罚 –1
    ${E^{\rm H}}$ IFH模式能量损耗惩罚 –1
    ${E^{\rm B}}/{E^{{\mathrm{B}}'}}$ JBC模式能量损耗奖励/惩罚(通信成功/失败) 1/–1
    ${E^{\mathrm{E}}}/{E^{{\mathrm{E}}'}}$ EH模式能量采集奖励/惩罚(采集成功/失败) 2/–1
    下载: 导出CSV
  • [1] ŠIMON O and GÖTTHANS T. A survey on the use of deep learning techniques for UAV jamming and deception[J]. Electronics, 2022, 11(19): 3025. doi: 10.3390/electronics11193025.
    [2] XUE Haonan, ZHUO Zhihai, YAN Weihao, et al. Research on UAV jamming signal generation based on intelligent jamming[J]. IEEE Access, 2025, 13: 14686–14701. doi: 10.1109/ACCESS.2025.3530987.
    [3] YU A, KOLOTYLO I, HASHIM H A, et al. Electronic warfare cyberattacks, countermeasures, and modern defensive strategies of UAV avionics: A survey[J]. IEEE Access, 2025, 13: 68660–68681. doi: 10.1109/ACCESS.2025.3561068.
    [4] LIU Dianxiong, DU Zhiyong, LIU Xiaodu, et al. Task-based network reconfiguration in distributed UAV swarms: A bilateral matching approach[J]. IEEE/ACM Transactions on Networking, 2022, 30(6): 2688–2700. doi: 10.1109/TNET.2022.3181036.
    [5] YAZICIGIL R T, NADEAU P, RICHMAN D, et al. Ultra-fast bit-level frequency-hopping transmitter for securing low-power wireless devices[C]. Proceedings of 2018 IEEE Radio Frequency Integrated Circuits Symposium, Philadelphia, USA, 2018: 176–179. doi: 10.1109/RFIC.2018.8428994.
    [6] SHE Honghan, CHENG Yufan, ZHANG Wenzihan, et al. A synchronization acquisition algorithm based on the frequency hopping pulses combining[J]. China Communications, 2024, 21(4): 74–87. doi: 10.23919/JCC.fa.2023-0505.202404.
    [7] WANG Beibei, WU Yongle, LIU K J R, et al. An anti-jamming stochastic game for cognitive radio networks[J]. IEEE Journal on Selected Areas in Communications, 2011, 29(4): 877–889. doi: 10.1109/JSAC.2011.110418.
    [8] GAO Yulan, XIAO Yue, WU Mingming, et al. Game theory-based anti-jamming strategies for frequency hopping wireless communications[J]. IEEE Transactions on Wireless Communications, 2018, 17(8): 5314–5326. doi: 10.1109/TWC.2018.2841921.
    [9] 邓喆, 鲁信金, 雷菁. 一种非合作通信中跳频序列多站点联合预测方法[J]. 无线电通信技术, 2022, 48(5): 865–878. doi: 10.3969/j.issn.1003-3114.2022.05.013.

    DENG Zhe, LU Xinjin, and LEI Jing. Research on joint prediction method of frequency hopping sequence[J]. Radio Communications Technology, 2022, 48(5): 865–878. doi: 10.3969/j.issn.1003-3114.2022.05.013.
    [10] RAO Ning, XU Hua, QI Zisen, et al. Adaptive jamming decision-making against FHSS communications via inexpert demonstrations assisted meta reinforcement learning[J]. IEEE Communications Letters, 2025, 29(1): 105–109. doi: 10.1109/LCOMM.2024.3502423.
    [11] 康雅洁, 林艳, 张一晋. 基于贝叶斯Q学习的无人机集群抗干扰智能快跳频算法[J]. 航天控制, 2022, 40(2): 73–78. doi: 10.3969/j.issn.1006-3242.2022.02.013.

    KANG Yajie, LIN Yan, and ZHANG Yijin. Intelligent fast frequency hopping algorithm for UAV swarm anti-jamming based on Bayesian Q-learning[J]. Aerospace Control, 2022, 40(2): 73–78. doi: 10.3969/j.issn.1006-3242.2022.02.013.
    [12] 王瑞东, 张彦龙, 魏鹏, 等. 战术跳频系统智能抗干扰决策[J]. 信号处理, 2023, 39(1): 84–95. doi: 10.16798/j.issn.1003-0530.2023.01.009.

    WANG Ruidong, ZHANG Yanlong, WEI Peng, et al. Intelligent anti-jamming strategy for tactical frequency-hopping system[J]. Journal of Signal Processing, 2023, 39(1): 84–95. doi: 10.16798/j.issn.1003-0530.2023.01.009.
    [13] 张惠婷, 张然, 刘敏提, 等. 基于深度强化学习的无人机通信抗干扰算法[J]. 兵器装备工程学报, 2022, 43(10): 27–34. doi: 10.11809/bqzbgcxb2022.10.004.

    ZHANG Huiting, ZHANG Ran, LIU Minti, et al. Anti-jamming algorithm of UAV communication based on deep reinforcement learning[J]. Journal of Ordnance Equipment Engineering, 2022, 43(10): 27–34. doi: 10.11809/bqzbgcxb2022.10.004.
    [14] KE Zhenyi, WANG Ximing, DU Zhiyong, et al. Intelligent frequency reuse for dynamic spectrum anti-jamming: A hybrid-reward-based multi-agent deep reinforcement learning approach[J]. IEEE Wireless Communications Letters, 2025, 14(3): 771–775. doi: 10.1109/LWC.2024.3523221.
    [15] DU Zhiyong, WANG Shiyu, WANG Ximing, et al. Formation-aware UAV network self-organization with game-theoretic distributed topology control[J]. IEEE Transactions on Cognitive Communications and Networking, 2025, doi: 10.1109/TCCN.2025.3530443. (查阅网上资料,未找到对应的卷期页码信息,请确认).
    [16] LI Wen, QIN Yuan, FENG Zhibin, et al. “Advancing secretly by an unknown path”: A reinforcement learning-based hidden strategy for combating intelligent reactive jammer[J]. IEEE Wireless Communications Letters, 2022, 11(7): 1320–1324. doi: 10.1109/LWC.2022.3165633.
    [17] VAN HUYNH N, NGUYEN D N, HOANG D T, et al. "Jam me if you can: '' Defeating jammer with deep dueling neural network architecture and ambient backscattering augmented communications[J]. IEEE Journal on Selected Areas in Communications, 2019, 37(11): 2603–2620. doi: 10.1109/JSAC.2019.2933889.
    [18] MNIH V, KAVUKCUOGLU K, SILVER D, et al. Human-level control through deep reinforcement learning[J]. Nature, 2015, 518(7540): 529–533. doi: 10.1038/nature14236.
    [19] DU Zhiyong, DENG Yansha, GUO Weisi, et al. Green deep reinforcement learning for radio resource management: Architecture, algorithm compression, and challenges[J]. IEEE Vehicular Technology Magazine, 2021, 16(1): 29–39. doi: 10.1109/MVT.2020.3015184.
    [20] 胡杨林, 张天魁, 李博, 等. 无人机使能的通信感知一体化组网与技术研究综述[J]. 电子与信息学报, 2025, 47(4): 859–875. doi: 10.11999/JEIT241116.

    HU Yanglin, ZHANG Tiankui, LI Bo, et al. A survey on UAV-enabled integrated sensing and communication networking and technologies[J]. Journal of Electronics & Information Technology, 2025, 47(4): 859–875. doi: 10.11999/JEIT241116.
    [21] CHEN Yunfei, SABNIS K T, and ABD-ALHAMEED R A. New formula for conversion efficiency of RF EH and its wireless applications[J]. IEEE Transactions on Vehicular Technology, 2016, 65(11): 9410–9414. doi: 10.1109/TVT.2016.2515843.
    [22] SAMALA S, MISHRA S, and SINGH S S. Spectrum sensing techniques in cognitive radio technology: A review paper[J]. Journal of Communications, 2020, 15(7): 577–582. doi: 10.12720/jcm.15.7.577-582.
    [23] HE Kaiming and SUN Jian. Convolutional neural networks at constrained time cost[C]. Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition, Boston, USA, 2015. doi: 10.1109/CVPR.2015.7299173.
    [24] GAO Yayun, YUAN Ye, LI Huiyong, et al. Reinforcement learning-based antijamming strategy for self-defense jammer-aided radar systems[J]. IEEE Transactions on Aerospace and Electronic Systems, 2025, 61(2): 3852–3867. doi: 10.1109/TAES.2024.3492168.
  • 加载中
图(7) / 表(3)
计量
  • 文章访问数:  20
  • HTML全文浏览量:  9
  • PDF下载量:  2
  • 被引次数: 0
出版历程
  • 收稿日期:  2025-06-19
  • 修回日期:  2025-09-09
  • 网络出版日期:  2025-09-12

目录

    /

    返回文章
    返回