基于链路状态感知增强的战术通信网络智能路由算法

石怀峰; 周龙; 潘成胜; 曹康宁; 刘超凡; 吕淼

doi:10.11999/JEIT241132

基于链路状态感知增强的战术通信网络智能路由算法

doi: 10.11999/JEIT241132 cstr: 32379.14.JEIT241132

石怀峰^{1, 2},
周龙¹,
潘成胜^1, ,,
曹康宁¹,
刘超凡¹,
吕淼¹

1.
南京信息工程大学复杂环境智能保障技术教育部重点实验室南京 210044
2.
国防科技大学第六十三研究所南京 210007

基金项目: 复杂环境智能保障技术教育部重点实验室基金(B2202401)

详细信息

作者简介:
石怀峰：男，副教授，研究方向为指挥控制网络、6G网络

周龙：男，硕士生，研究方向为指挥控制网络、流量控制

潘成胜：男，教授，研究方向为指挥控制网络、6G网络

曹康宁：男，硕士生，研究方向为指挥控制网络、流量控制

刘超凡：男，本科生，研究方向为6G网络、深度强化学习

吕淼：男，本科生，研究方向为6G网络、深度强化学习

通讯作者:
潘成胜　pancs@nuist.edu.cn

中图分类号: TN957
计量
- 文章访问数: 481
- HTML全文浏览量: 303
- PDF下载量: 52
- 被引次数: 0
出版历程
- 收稿日期: 2024-12-24
- 修回日期: 2025-04-07
- 网络出版日期: 2025-04-25
- 刊出日期: 2025-07-22

Link State Awareness Enhanced Intelligent Routing Algorithm for Tactical Communication Networks

1.
Key Laboratory of Intelligent Support Technology for Complex Environments, Ministry of Education, Nanjing University of Information Science and Technology, Nanjing 210044, China
2.
63rd Research Institute of National University of Defense Technology, Nanjing 210007, China

Funds: The Foundation of Key Laboratory of Intelligent Support Technology for Complex Environments, Ministry of Education (B2202401)

摘要

摘要: 针对现有基于深度强化学习的路由算法采用单一神经网络结构，无法全面感知各链路状态的复杂依赖关系，导致算法在网络状态时变条件下的路由决策准确性和鲁棒性受限的问题，该文提出一种基于链路状态感知增强的战术通信网络智能路由算法(DRL-SGA)。该算法在利用近端策略优化(PPO)智能体采集网络状态序列的基础上，构建替代PPO中全连接神经网络(FCNN)的链路状态感知增强模块，以捕获网络状态序列之间的时空依赖关系，提升路由决策模型对时变网络状态的适应能力。进一步，将链路状态感知增强模块输出的动作与网络环境进行周期性交互，以探索满足时延敏感、带宽敏感、可靠性敏感等异质业务差异化传输需求的最佳路由。实验结果表明，与OSPF, DQN, DDPG, A3C和DRL-ST等基准路由算法相比，该文提出的DRL-SGA路由算法在平均端到端时延、平均网络吞吐量、平均丢包率等性能上均有不同程度的优势，且对带宽资源受限、拓扑动态变化等复杂场景具有更强的适应能力。
- 智能路由 /
- 链路状态 /
- 近端策略优化 /
- 服务质量 /
- 战术通信网络
Abstract: Objective Operational concept iteration, combat style innovation, and the emergence of new combat forces are accelerating the transition of warfare toward intelligent systems. In this context, tactical communication networks must establish end-to-end transmission paths through heterogeneous links, including ultra-shortwave and satellite communications, to meet differentiated routing requirements for multi-modal services sensitive to latency, bandwidth, and reliability. Existing Deep Reinforcement Learning (DRL)-based intelligent routing algorithms primarily use single neural network architectures, which inadequately capture complex dependencies among link states. This limitation reduces the accuracy and robustness of routing decisions under time-varying network conditions. To address this, a link state perception-enhanced intelligent routing algorithm (DRL-SGA) is proposed. By capturing spatiotemporal dependencies in link state sequences, the algorithm improves the adaptability of routing decision models to dynamic network conditions and enables more effective path selection for multi-modal service transmission. Methods The proposed DRL-SGA algorithm incorporates a link state perception enhancement module that integrates a Graph Neural Network (GNN) and an attention mechanism into a Proximal Policy Optimization (PPO) agent framework for collecting network state sequences. This module extracts high-order features from the sequences across temporal and spatial dimensions, thereby addressing the limited global link state awareness of the PPO agent’s Fully Connected Neural Network (FCNN). This enhancement improves the adaptability of the routing decision model to time-varying network conditions. The Actor-Critic framework enables periodic interaction between the agent and the network environment, while an experience replay pool continuously refines policy parameters. This process facilitates the discovery of routing paths that meet heterogeneous transmission requirements across latency-, bandwidth-, and reliability-sensitive services. Results and Discussions The routing decision capability of the DRL-SGA algorithm is evaluated in a simulated network comprising 47 routing nodes and 61 communication links. Its performance is compared with that of five other routing algorithms under varying traffic intensities. The results show that DRL-SGA provides superior adaptability to heterogeneous network environments. At a traffic intensity of 100 kbit/s, DRL-SGA reduces latency by 14.42～33.57% compared with the other algorithms (Figure 4). Network throughput increases by 2.51～23.41% (Figure 5). In scenarios characterized by resource constraints or topological changes, DRL-SGA consistently maintains higher service quality and greater adaptability to fluctuations in network state (Figures 7～12). Ablation experiments confirm the effectiveness of the individual components within the link state perception enhancement module in improving the algorithm’s perception capability (Table 3). Conclusions A link state perception-enhanced intelligent routing algorithm (DRL-SGA) is proposed for tactical communication networks. By extracting high-order features from link state sequences across temporal and spatial dimensions, the algorithm addresses the limited global link state awareness of the PPO agent’s FCNN. Through the Actor-Critic framework and periodic interactions between the agent and the network environment, DRL-SGA enables iterative optimization of routing strategies, improving decision accuracy and robustness under dynamic topology and link conditions. Experimental results show that DRL-SGA meets the differentiated transmission requirements of heterogeneous services—latency-sensitive, bandwidth-sensitive, and reliability-sensitive, while offering improved adaptability to variations in network state. However, the algorithm may exhibit delayed convergence when training samples are insufficient in rapidly changing environments. Future work will examine the integration of diffusion models to enrich training data and accelerate convergence.
- Intelligent routing /
- Link state /
- Proximal Policy Optimization (PPO) /
- Quality of Service (QoS) /
- Tactical communication networks

HTML全文

图 1 链路状态感知增强模块组成结构

下载: 全尺寸图片幻灯片

图 2 DRL-SGA算法框架

下载: 全尺寸图片幻灯片

图 3 战术通信网络总体架构

下载: 全尺寸图片幻灯片

图 4 时延敏感业务的平均端到端时延性能对比

下载: 全尺寸图片幻灯片

图 6 可靠性敏感业务的平均丢包率性能对比

下载: 全尺寸图片幻灯片

图 5 带宽敏感业务的平均网络吞吐量性能对比

下载: 全尺寸图片幻灯片

图 7 资源受限场景下的平均端到端时延性能

下载: 全尺寸图片幻灯片

图 9 资源受限场景下的平均丢包率性能

下载: 全尺寸图片幻灯片

图 8 资源受限场景下的平均网络吞吐量性能

下载: 全尺寸图片幻灯片

图 10 拓扑变化场景下的平均端到端时延性能

下载: 全尺寸图片幻灯片

图 12 拓扑变化场景下的平均丢包率性能

下载: 全尺寸图片幻灯片

图 11 拓扑变化场景下的平均网络吞吐量性能

下载: 全尺寸图片幻灯片

图 13 奖励收敛情况对比

下载: 全尺寸图片幻灯片

1 DRL-SGA算法

输入：奖励折扣因子$ \gamma $，Actor学习率${\lambda _1}$，Critic学习率${\lambda _2}$，训练回合总数$T$，交互频率${N_{{\mathrm{step}}}}$，经验回放池容量$M$，经验样本数$D$，网　　　　络状态序列${{\boldsymbol{s}}_t}$
输出：全局最优传输路径
1：初始化Actor策略网络参数$\theta $和Critic价值网络参数$\varphi $；
2：初始化经验回放池容量$M$；
3：for $ {\text{episode}} = 1 $ to $T$ do：
4：　智能体在$t$时刻获取初始网络状态序列$ {{\boldsymbol{s}}_t} = [{{\boldsymbol{x}}_{t - l + 1}},{{\boldsymbol{x}}_{t - l + 2}}, \cdots ,{{\boldsymbol{x}}_t}] $；
5：　for $t = 1$ to $ {N_{{\mathrm{step}}}} $：
6：　Actor网络根据策略${\pi _\theta }$生成最优路径动作${{\boldsymbol{a}}_t}$并执行；
7：　智能体获得奖励${r_t}$，新的网络状态序列${{\boldsymbol{s}}_{t + 1}}$；
8：　将经验样本$ ({{\boldsymbol{s}}_t},{{\boldsymbol{a}}_t},{r_t},{{\boldsymbol{s}}_{t + 1}}) $存储到经验回放池中；
9：　更新状态序列${{\boldsymbol{s}}_t} \leftarrow {{\boldsymbol{s}}_{t + 1}}$；
10：end for
11：if ${{{\mathrm{len}}(M) \gt D}}$：
12： for $i = 1$ to $D$：
13：　采集经验样本$({{\boldsymbol{s}}_i},{{\boldsymbol{a}}_i},{r_i},{{\boldsymbol{s}}_{i + 1}})$输入Critic网络，获取所有状态价值$V({{\boldsymbol{s}}_i})$；
14：　根据式(14)计算优势函数$\hat A$，利用反向传播更新Critic网络参数$\varphi $；
15：　根据式(16)计算目标函数${L^{{\mathrm{clip}}}}$，利用反向传播更新Actor网络参数$\theta $；
16： end for
17：end if
18：end for

下载: 导出CSV

表 1 战术通信网络中的链路类型及相应带宽

网络类型	链路类型	链路带宽(Mbit/s)
骨干网	微波、散射、区宽、卫星、HF、 VHF、UHF、被覆线、数据链	1～10
旅指挥所	被覆线、VHF、UHF、区宽	1～10
旅指挥所	光纤	622.08
作战分队1	UHF, VHF	1
作战分队2	UHF, VHF	1
作战分队3	UHF, VHF	1
作战分队4	UHF, VHF	1
作战分队5	UHF, VHF	1

下载: 导出CSV

表 2 DRL-SGA参数设置

参数	值
优化器	Adam
Actor学习率${\lambda _1}$	0.001
Critic学习率${\lambda _2}$	0.001
奖励折扣因子$\gamma $	0.9
裁剪因子$\varepsilon $	0.2
经验回放池容量$M$	5000
经验样本数$D$	32
训练回合总数$T$	200
交互频率${N_{{\mathrm{step}}}}$	30
GRU单元数量	3
GAT单元数量	1
MLP隐藏层数量	128
可行路径数$k$	10

下载: 导出CSV

表 3 去除各模块后算法在不同指标上的表现

算法(去除子模块)	奖励均值	奖励方差	端到端时延(ms)	网络吞吐量(kbit/s)	丢包率(%)
DRL-SGA(GAT)	85.08	2.95	67.98	69.95	0.351
DRL-SGA(GRU)	85.79	2.37	73.90	66.53	0.355
DRL-SGA(自注意力)	86.33	2.49	70.94	75.06	0.352
DRL-SGA(无)	87.29	2.80	59.12	85.30	0.348

下载: 导出CSV

参考文献(17)

[1]	吉祥, 蒋锴, 成海东. 全域作战指挥信息系统总体架构及核心支柱[J]. 指挥与控制学报, 2023, 9(2): 225–232. doi: 10.3969/j.issn.2096-0204.2023.02.0225. JI Xiang, JIANG Kai, and CHENG Haidong. Architecture and core pillars of all-domain operation command information system[J]. Journal of Command and Control, 2023, 9(2): 225–232. doi: 10.3969/j.issn.2096-0204.2023.02.0225.
[2]	张姣, 曹阔, 王海军, 等. 基于分层虚拟簇的多信道组网算法[J]. 电子与信息学报, 2023, 45(11): 4041–4049. doi: 10.11999/JEIT230802. ZHANG Jiao, CAO Kuo, WANG Haijun, et al. Multi-channel network construction algorithm based on hierarchical virtual clustering[J]. Journal of Electronics & Information Technology, 2023, 45(11): 4041–4049. doi: 10.11999/JEIT230802.
[3]	AULIA M A, SUKMANDHANI A A, and OHLIATI J. RIP and OSPF routing protocol analysis on defined network software[C]. 2022 International Electronics Symposium (IES), Surabaya, Indonesia, 2022: 393–397. doi: 10.1109/IES55876.2022.9888355.
[4]	VERMA A and BHARDWAJ N. A review on routing information protocol (RIP) and open shortest path first (OSPF) routing protocol[J]. International Journal of Future Generation Communication and Networking, 2016, 9(4): 161–170. doi: 10.14257/ijfgcn.2016.9.4.13.
[5]	NASRALLAH A, THYAGATURU A S, ALHARBI Z, et al. Ultra-low latency (ULL) networks: The IEEE TSN and IETF DetNet standards and related 5G ULL research[J]. IEEE Communications Surveys & Tutorials, 2019, 21(1): 88–145. doi: 10.1109/COMST.2018.2869350.
[6]	BELGAUM M R, MUSA S, ALI F, et al. Self-socio adaptive reliable particle swarm optimization load balancing in software-defined networking[J]. IEEE Access, 2023, 11: 101666–101677. doi: 10.1109/ACCESS.2023.3314791.
[7]	YAO Guangshun, DONG Zaixiu, WEN Weiming, et al. A routing optimization strategy for wireless sensor networks based on improved genetic algorithm[J]. Journal of Applied Science and Engineering, 2016, 19(2): 221–228. doi: 10.6180/jase.2016.19.2.13.
[8]	DENG Xia, ZENG Shouyuan, CHANG Le, et al. An ant colony optimization-based routing algorithm for load balancing in LEO satellite networks[J]. Wireless Communications and Mobile Computing, 2022, 2022(1): 3032997. doi: 10.1155/2022/3032997.
[9]	HUANG Xiaohong, YUAN Tingting, QIAO Guanhua, et al. Deep reinforcement learning for multimedia traffic control in software defined networking[J]. IEEE Network, 2018, 32(6): 35–41. doi: 10.1109/MNET.2018.1800097.
[10]	DONG Tianjian, ZHUANG Zirui, QI Qi, et al. Intelligent joint network slicing and routing via GCN-powered multi-task deep reinforcement learning[J]. IEEE Transactions on Cognitive Communications and Networking, 2022, 8(2): 1269–1286. doi: 10.1109/TCCN.2021.3136221.
[11]	HE Nan, YANG Song, LI Fan, et al. Leveraging deep reinforcement learning with attention mechanism for virtual network function placement and routing[J]. IEEE Transactions on Parallel and Distributed Systems, 2023, 34(4): 1186–1201. doi: 10.1109/TPDS.2023.3240404.
[12]	CASAS-VELASCO D M, RENDON O M C, and DA FONSECA N L S. DRSIR: A deep reinforcement learning approach for routing in software-defined networking[J]. IEEE Transactions on Network and Service Management, 2022, 19(4): 4807–4820. doi: 10.1109/TNSM.2021.3132491.
[13]	YANG Sijin, ZHUANG Lei, ZHANG Jianhui, et al. A multipolicy deep reinforcement learning approach for multiobjective joint routing and scheduling in deterministic networks[J]. IEEE Internet of Things Journal, 2024, 11(10): 17402–17418. doi: 10.1109/JIOT.2024.3358403.
[14]	SUN Penghao, GUO Zehua, LI Junfei, et al. Enabling scalable routing in software-defined networks with deep reinforcement learning on critical nodes[J]. IEEE/ACM Transactions on Networking, 2022, 30(2): 629–640. doi: 10.1109/TNET.2021.3126933.
[15]	潘成胜, 曹康宁, 石怀峰, 等. 基于深度强化学习的战术通信网络路径优选算法[J]. 中国电子科学研究院学报, 2024, 19(2): 138–148. doi: 10.3969/j.issn.1673-5692.2024.02.005. PAN Chengsheng, CAO Kangning, SHI Huaifeng, et al. Tactical communication network path selection algorithm based on deep reinforcement learning[J]. Journal of China Academy of Electronics and Information Technology, 2024, 19(2): 138–148. doi: 10.3969/j.issn.1673-5692.2024.02.005.
[16]	于全. 战术通信理论与技术[M]. 北京: 人民邮电出版社, 2020: 192–198. YU Quan. Communications in Tactical Environments: Theories and Technologies[M]. Beijing: Posts & Telecom Press, 2020: 192–198.
[17]	LEE D, KIM J, CHO K, et al. Advanced double layered multi-agent Systems based on A3C in real-time path planning[J]. Electronics, 2021, 10(22): 2762. doi: 10.3390/electronics10222762.