Multi-Mode Anti-Jamming for UAV Communications: A Cooperative Mode-Based Decision-Making Approach via Two-Dimensional Transfer Reinforcement Learning
-
摘要: 针对无人机(UAV)在复杂电磁环境下通信易受干扰攻击的问题,该文提出一种多模式协同抗干扰架构。通过融合智能跳频(IFH)、基于干扰的反向散射通信(JBC)与能量采集(EH)技术,构建“规避-利用-转化”三位一体的防御体系,并设计二维迁移学习机制解决资源受限平台的实时决策难题。在任务维度建立模式间策略共享网络,提取决策共性特征并设计平行深度Q学习网络(DQN)进行策略学习,在抗干扰模式维度通过历史经验复用加速在线学习。仿真结果表明,该文所提方案较传统深度强化学习算法收敛速度提升64%,在动态干扰环境下通信中断概率始终低于20%。通过合理选择抗干扰模式与信道,系统在不同干扰模式下仍能维持高效通信,实现抗干扰性能与能耗的最优均衡。Abstract:
Objective With the widespread application of Unmanned Aerial Vehicles (UAVs) in military reconnaissance, logistics, and emergency communications, ensuring the security and reliability of UAV communication systems has become a critical challenge. Wireless channels are highly vulnerable to diverse jamming attacks. Traditional anti-jamming techniques, such as Frequency-Hopping Spread Spectrum (FHSS), are limited in dynamic spectrum environments and may be compromised by advanced machine learning algorithms. Furthermore, UAVs operate under strict constraints on onboard computational power and energy, which hinders the real-time use of complex anti-jamming algorithms. To address these challenges, this study proposes a multi-mode anti-jamming framework that integrates Intelligent Frequency Hopping (IFH), Jamming-based Backscatter Communication (JBC), and Energy Harvesting (EH) to strengthen communication resilience in complex electromagnetic environments. A Multi-mode Transfer Deep Q-Learning (MT-DQN) method is further proposed, enabling two-dimensional transfer to improve learning efficiency and adaptability under resource constraints. By leveraging transfer learning, the framework reduces computational load and accelerates decision-making, thereby allowing UAVs to counter jamming threats effectively even with limited resources. Methods The proposed framework adopts a multi-mode anti-jamming architecture that integrates IFH, JBC, and EH to establish a comprehensive defense strategy of “avoiding, utilizing, and converting” interference. The system is formulated as a Markov Decision Process (MDP) to dynamically optimize the selection of anti-jamming modes and communication channels. To address the challenges of high-dimensional state–action spaces and restricted onboard computational resources, a two-dimensional transfer reinforcement learning framework is developed. This framework comprises a cross-mode strategy-sharing network for extracting common features across different anti-jamming modes ( Fig. 3 ) and a parallel network for cross-task transfer learning to adapt to variable task requirements (Fig. 4 ). The cross-mode strategy-sharing network accelerates convergence by reusing experiences, whereas the cross-task transfer learning network enables knowledge transfer under different task weightings. The reward function is designed to balance communication throughput and energy consumption. It guides the UAV to select the optimal anti-jamming strategy in real time based on spectrum sensing outcomes and task priorities.Results and Discussions The simulation results validate the effectiveness of the proposed MT-DQN. The dynamic weight allocation mechanism exhibits strong cross-task transfer capability ( Fig. 6 ), as weight adjustments enable rapid convergence toward the corresponding optimal reward values. Compared with conventional Deep Reinforcement Learning (DRL) algorithms, the proposed method achieves a 64% faster convergence rate while maintaining the probability of communication interruption below 20% in dynamic jamming environments (Fig. 7 ). The framework shows robust performance in terms of throughput, convergence rate, and adaptability to variations in jamming patterns. In scenarios with comb-shaped and sweep-frequency jamming, the proposed method yields higher normalized throughput and faster convergence, exceeding baseline DQN and other transfer learning-based approaches. The results also indicate that MT-DQN improves stability and accelerates policy optimization during jamming pattern switching (Fig. 8 ), highlighting its adaptability to abrupt changes in jamming patterns through transfer learning.Conclusions This study proposes a multi-modal anti-jamming framework that integrates IFH, JBC, and EH, thereby enhancing the communication capability of UAVs. The proposed solution shifts the paradigm from traditional jamming avoidance toward active jamming exploitation, repurposing jamming signals as covert carriers to overcome the limitations of conventional frequency-hopping systems. Simulation results confirm the advantages of the proposed method in throughput performance, convergence rate, and environmental adaptability, demonstrating stable communication quality even under complex electromagnetic conditions. Although DRL approaches are inherently constrained in handling completely random jamming without intrinsic patterns, this work improves adaptability to dynamic jamming through transfer learning and cross-modal strategy sharing. These findings provide a promising approach for countering complex jamming threats in UAV networks. Future work will focus on validating the proposed algorithm in hardware implementations and enhancing the robustness of DRL methods under highly non-stationary, though not entirely unpredictable, jamming conditions such as pseudo-random or adaptive interference. -
1 基于二维迁移强化学习的UAV通信多模抗干扰算法
初始化:生成卷积神经网络$Q_{\theta _t^{\rm C}}^{\rm C}$, $Q_{\theta _t^{\rm E}}^{\rm E}$,网络权重$\theta _t^{\rm C}$, $\theta _t^{\rm E}$随机赋值,生成回放缓冲区C,D,定义回放缓冲区大小$Batch$,学习率$\alpha $,折扣因子$\gamma $以及权重$a$, $b$;初始化环境状态为${s_0}$,初始动作为${a_0}$ (1) Loop $t = 0,1, \cdots ,T$ (2) 智能体执行动作${a_t}$并获得奖励值${r^{\rm C}}\left( {{s_t},{a_t}} \right)$, ${r^{\rm E}}\left( {{s_t},{a_t}} \right)$,环境状态转移到${s_{t + 1}}$ (3) 智能体观测环境状态${s_{t + 1}}$,将经验数据$\left( {{s_t},{a_t},r_t^{\rm C},{s_{t + 1}}} \right)$, $\left( {{s_t},{a_t},r_t^{\rm E},{s_{t + 1}}} \right)$分别放入回放缓冲区C,D; (4) 根据经验数据$\left( {{s_t},{a_t},r_t^{\rm C},{s_{t + 1}}} \right)$得到另外2种模式的经验$\left( {{s_t},{a_{t'}},r_t^{{\mathrm{C}}'},{s_{t + 1}}} \right)$, $\left( {{s_t},{a_{t''}},r_t^{{\mathrm{C}}''},{s_{t + 1}}} \right)$并放入回放缓冲区C; (5) 根据经验数据$\left( {{s_t},{a_t},r_t^{\rm E},{s_{t + 1}}} \right)$得到另外两种模式的经验$\left( {{s_t},{a_{t'}},r_t^{{\mathrm{E}}'},{s_{t + 1}}} \right)$,$\left( {{s_t},{a_{t''}},r_t^{{\mathrm{E}}''},{s_{t + 1}}} \right)$并放入回放缓冲区D; (6) if 回放缓冲区C, D数据量$ \ge {\mathrm{Batch}}$ (7) 智能体从回放缓冲区中随机抽取${\mathrm{Batch}}$个样本数据,循环每一组抽样数据: (1)将观测状态${s_t}$作为卷积神经网络$Q_{\theta _t^{\rm C}}^{\rm C}$, $Q_{\theta _t^{\rm E}}^{\rm E}$的输入,获得目标Q值${\mathrm{Target}}\:{Q^{\rm C}}$, ${\mathrm{Target}}{Q^{\rm E}}$; (2)将观测状态${s_{t + 1}}$作为卷积神经网络的输入,获得预测Q值$ Q_{\theta _t^{\rm C}}^{\rm C}\left( {{s_{t + 1}},{a_t}} \right) $, $Q_{\theta _t^{\rm E}}^{\rm E}\left( {{s_{t + 1}},{a_t}} \right)$; (3)${\mathrm{Target}}{Q^{\rm C}}{\mkern 1mu} {\mkern 1mu} = {\mkern 1mu} {\mkern 1mu} {\mathrm{Target}}{Q^{\rm C}} \cdot \left( {1 - \alpha } \right) + \alpha \cdot \left[ {{r^{\rm C}}\left( {{s_t},{a_t}} \right) + \gamma \mathop {\max }\limits_{a \in \mathcal{A}} Q_{\theta _t^{\rm C}}^{\rm C}\left( {{s_{t + 1}},a} \right)} \right]$ ${\mathrm{Target}}{Q^{\rm E}}\: = \:{\mathrm{Target}}{Q^{\rm E}} \cdot \left( {1 - \alpha } \right) + \alpha \cdot \left[ {{r^{\rm E}}\left( {{s_t},{a_t}} \right) + \gamma \mathop {\max }\limits_{a \in \mathcal{A}} Q_{\theta _t^{\rm E}}^{\rm E}\left( {{s_{t + 1}},a} \right)} \right]$ (8) 计算损失函数$L\left( {\theta _t^{\rm C}} \right)$, $L\left( {\theta _t^{\rm E}} \right)$并更新网络权重$\theta _t^{{\mathrm{C}}'}$, $\theta _t^{{\mathrm{E}}'}$; (9) 智能体将当前观测状态${s_{t = 1}}$作为网络$Q_{\theta _t^{{\mathrm{C}}'}}^{\rm C}$, $Q_{\theta _t^{\mathrm{{E}}'}}^{\rm E}$的输入,获得动作值函数$Q_{\theta _t^{{\mathrm{C}}'}}^{\rm C}\left( {{s_{t + 1}},a} \right)$, $Q_{\theta _t^{{\mathrm{E}}'}}^{\rm E}\left( {{s_{t + 1}},a} \right)$; (10) 智能体基于复合动作值函数的加权平均值,利用纯贪婪策略选择动作${a_{t + 1}} = arg\mathop {\max }\limits_{a \in \mathcal{A}} \left[ {a \cdot Q_{\theta _t^{{\mathrm{C}}'}}^{\rm C}\left( {{s_{t + 1}},a} \right) + b \cdot Q_{\theta _t^{{\mathrm{E}}'}}^{\rm E}\left( {{s_{t + 1}},a} \right)} \right]$; (11) 更新${s_t} = {s_{t = 1}}$, ${a_t} = {a_{t = 1}}$; (12) End Loop 表 1 仿真参数设置
参数 参数值 频谱感知历史时长 $L = 100{\text{ }}{\mathrm{ms}}$ 训练批次数量 ${\mathrm{Batch}} = 128$ 回放缓冲区容量 $D = 5\;000$ 学习率 $\alpha = 0.5$ 折扣因子 $\gamma = 0.9$ 路径损耗因子 $\mu = 2$ 表 2 奖励参数设置
符号 参数含义 参数值 ${C^{\rm H}}/{C^{{\mathrm{H}}'}}$ IFH模式信道容量奖励/惩罚(通信成功/失败) 2/–1 ${C^{\rm B}}/{C^{{\mathrm{B}}'}}$ JBC模式信道容量奖励/惩罚(通信成功/失败) 1/–1 ${C^{\mathrm{E}}}$ EH模式信道容量惩罚 –1 ${E^{\rm H}}$ IFH模式能量损耗惩罚 –1 ${E^{\rm B}}/{E^{{\mathrm{B}}'}}$ JBC模式能量损耗奖励/惩罚(通信成功/失败) 1/–1 ${E^{\mathrm{E}}}/{E^{{\mathrm{E}}'}}$ EH模式能量采集奖励/惩罚(采集成功/失败) 2/–1 -
[1] ŠIMON O and GÖTTHANS T. A survey on the use of deep learning techniques for UAV jamming and deception[J]. Electronics, 2022, 11(19): 3025. doi: 10.3390/electronics11193025. [2] XUE Haonan, ZHUO Zhihai, YAN Weihao, et al. Research on UAV jamming signal generation based on intelligent jamming[J]. IEEE Access, 2025, 13: 14686–14701. doi: 10.1109/ACCESS.2025.3530987. [3] YU A, KOLOTYLO I, HASHIM H A, et al. Electronic warfare cyberattacks, countermeasures, and modern defensive strategies of UAV avionics: A survey[J]. IEEE Access, 2025, 13: 68660–68681. doi: 10.1109/ACCESS.2025.3561068. [4] LIU Dianxiong, DU Zhiyong, LIU Xiaodu, et al. Task-based network reconfiguration in distributed UAV swarms: A bilateral matching approach[J]. IEEE/ACM Transactions on Networking, 2022, 30(6): 2688–2700. doi: 10.1109/TNET.2022.3181036. [5] YAZICIGIL R T, NADEAU P, RICHMAN D, et al. Ultra-fast bit-level frequency-hopping transmitter for securing low-power wireless devices[C]. Proceedings of 2018 IEEE Radio Frequency Integrated Circuits Symposium, Philadelphia, USA, 2018: 176–179. doi: 10.1109/RFIC.2018.8428994. [6] SHE Honghan, CHENG Yufan, ZHANG Wenzihan, et al. A synchronization acquisition algorithm based on the frequency hopping pulses combining[J]. China Communications, 2024, 21(4): 74–87. doi: 10.23919/JCC.fa.2023-0505.202404. [7] WANG Beibei, WU Yongle, LIU K J R, et al. An anti-jamming stochastic game for cognitive radio networks[J]. IEEE Journal on Selected Areas in Communications, 2011, 29(4): 877–889. doi: 10.1109/JSAC.2011.110418. [8] GAO Yulan, XIAO Yue, WU Mingming, et al. Game theory-based anti-jamming strategies for frequency hopping wireless communications[J]. IEEE Transactions on Wireless Communications, 2018, 17(8): 5314–5326. doi: 10.1109/TWC.2018.2841921. [9] 邓喆, 鲁信金, 雷菁. 一种非合作通信中跳频序列多站点联合预测方法[J]. 无线电通信技术, 2022, 48(5): 865–878. doi: 10.3969/j.issn.1003-3114.2022.05.013.DENG Zhe, LU Xinjin, and LEI Jing. Research on joint prediction method of frequency hopping sequence[J]. Radio Communications Technology, 2022, 48(5): 865–878. doi: 10.3969/j.issn.1003-3114.2022.05.013. [10] RAO Ning, XU Hua, QI Zisen, et al. Adaptive jamming decision-making against FHSS communications via inexpert demonstrations assisted meta reinforcement learning[J]. IEEE Communications Letters, 2025, 29(1): 105–109. doi: 10.1109/LCOMM.2024.3502423. [11] 康雅洁, 林艳, 张一晋. 基于贝叶斯Q学习的无人机集群抗干扰智能快跳频算法[J]. 航天控制, 2022, 40(2): 73–78. doi: 10.3969/j.issn.1006-3242.2022.02.013.KANG Yajie, LIN Yan, and ZHANG Yijin. Intelligent fast frequency hopping algorithm for UAV swarm anti-jamming based on Bayesian Q-learning[J]. Aerospace Control, 2022, 40(2): 73–78. doi: 10.3969/j.issn.1006-3242.2022.02.013. [12] 王瑞东, 张彦龙, 魏鹏, 等. 战术跳频系统智能抗干扰决策[J]. 信号处理, 2023, 39(1): 84–95. doi: 10.16798/j.issn.1003-0530.2023.01.009.WANG Ruidong, ZHANG Yanlong, WEI Peng, et al. Intelligent anti-jamming strategy for tactical frequency-hopping system[J]. Journal of Signal Processing, 2023, 39(1): 84–95. doi: 10.16798/j.issn.1003-0530.2023.01.009. [13] 张惠婷, 张然, 刘敏提, 等. 基于深度强化学习的无人机通信抗干扰算法[J]. 兵器装备工程学报, 2022, 43(10): 27–34. doi: 10.11809/bqzbgcxb2022.10.004.ZHANG Huiting, ZHANG Ran, LIU Minti, et al. Anti-jamming algorithm of UAV communication based on deep reinforcement learning[J]. Journal of Ordnance Equipment Engineering, 2022, 43(10): 27–34. doi: 10.11809/bqzbgcxb2022.10.004. [14] KE Zhenyi, WANG Ximing, DU Zhiyong, et al. Intelligent frequency reuse for dynamic spectrum anti-jamming: A hybrid-reward-based multi-agent deep reinforcement learning approach[J]. IEEE Wireless Communications Letters, 2025, 14(3): 771–775. doi: 10.1109/LWC.2024.3523221. [15] DU Zhiyong, WANG Shiyu, WANG Ximing, et al. Formation-aware UAV network self-organization with game-theoretic distributed topology control[J]. IEEE Transactions on Cognitive Communications and Networking, 2025, doi: 10.1109/TCCN.2025.3530443. (查阅网上资料,未找到对应的卷期页码信息,请确认). [16] LI Wen, QIN Yuan, FENG Zhibin, et al. “Advancing secretly by an unknown path”: A reinforcement learning-based hidden strategy for combating intelligent reactive jammer[J]. IEEE Wireless Communications Letters, 2022, 11(7): 1320–1324. doi: 10.1109/LWC.2022.3165633. [17] VAN HUYNH N, NGUYEN D N, HOANG D T, et al. "Jam me if you can: '' Defeating jammer with deep dueling neural network architecture and ambient backscattering augmented communications[J]. IEEE Journal on Selected Areas in Communications, 2019, 37(11): 2603–2620. doi: 10.1109/JSAC.2019.2933889. [18] MNIH V, KAVUKCUOGLU K, SILVER D, et al. Human-level control through deep reinforcement learning[J]. Nature, 2015, 518(7540): 529–533. doi: 10.1038/nature14236. [19] DU Zhiyong, DENG Yansha, GUO Weisi, et al. Green deep reinforcement learning for radio resource management: Architecture, algorithm compression, and challenges[J]. IEEE Vehicular Technology Magazine, 2021, 16(1): 29–39. doi: 10.1109/MVT.2020.3015184. [20] 胡杨林, 张天魁, 李博, 等. 无人机使能的通信感知一体化组网与技术研究综述[J]. 电子与信息学报, 2025, 47(4): 859–875. doi: 10.11999/JEIT241116.HU Yanglin, ZHANG Tiankui, LI Bo, et al. A survey on UAV-enabled integrated sensing and communication networking and technologies[J]. Journal of Electronics & Information Technology, 2025, 47(4): 859–875. doi: 10.11999/JEIT241116. [21] CHEN Yunfei, SABNIS K T, and ABD-ALHAMEED R A. New formula for conversion efficiency of RF EH and its wireless applications[J]. IEEE Transactions on Vehicular Technology, 2016, 65(11): 9410–9414. doi: 10.1109/TVT.2016.2515843. [22] SAMALA S, MISHRA S, and SINGH S S. Spectrum sensing techniques in cognitive radio technology: A review paper[J]. Journal of Communications, 2020, 15(7): 577–582. doi: 10.12720/jcm.15.7.577-582. [23] HE Kaiming and SUN Jian. Convolutional neural networks at constrained time cost[C]. Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition, Boston, USA, 2015. doi: 10.1109/CVPR.2015.7299173. [24] GAO Yayun, YUAN Ye, LI Huiyong, et al. Reinforcement learning-based antijamming strategy for self-defense jammer-aided radar systems[J]. IEEE Transactions on Aerospace and Electronic Systems, 2025, 61(2): 3852–3867. doi: 10.1109/TAES.2024.3492168. -