基于多目标强化学习的抗强干扰 Polar 编码优化方法

梁豪; 叶淦华; 陆锐敏; 王恒; 魏鹏

doi:10.11999/JEIT230572

基于多目标强化学习的抗强干扰 Polar 编码优化方法

doi: 10.11999/JEIT230572 cstr: 32379.14.JEIT230572

国防科技大学第六十三研究所南京 210001

基金项目: 国家自然科学基金(62201596)，国防科技大学科研计划(ZK22-45)

详细信息

作者简介:
梁豪：男，助理研究员，研究方向为信道编码，卫星通信

叶淦华：男，副研究员，研究方向为卫星通信抗干扰

陆锐敏：男，研究员，研究方向为卫星通信，抗干扰通信

王恒：男，高级工程师，研究方向为卫星通信中的信号处理

魏鹏：男，副研究员，研究方向为卫星通信抗干扰，智能抗干扰

通讯作者:
叶淦华　 milsatcom@163.com

中图分类号: TN911.2
计量
- 文章访问数: 793
- HTML全文浏览量: 632
- PDF下载量: 87
- 被引次数: 0
出版历程
- 收稿日期: 2023-06-09
- 修回日期: 2023-11-03
- 网络出版日期: 2023-11-13
- 刊出日期: 2023-11-28

Anti-strong Jamming Polar Coding Optimization Method with Multiobjective Reinforcement Learning

Sixty-third Research Institute, National University of Defense Technology, Nanjing 210001, China

Funds: The National Natural Science Foundation of China (62201596), The Research Planning Project of the National University of Defense Technology (ZK22-45)

摘要

摘要: 为提升跳频(FH)通信系统信息传输的可靠性和抗干扰能力，该文基于新型Polar编码的慢跳频抗干扰通信系统模型，提出一种适应强干扰环境的Polar编码构造优化方法。首先，面向包含常态和干扰态的混合信道设计多目标强化学习算法，然后优化编码过程中的信息位比特信道序列，提升码字的纠错性能，并通过初始化预处理和理论计算回报值降低算法执行复杂度。仿真结果表明，在包含强干扰的混合信道条件下，所提编码优化方法的全局误码性能优于传统编码构造方法，相比于第5代移动通信系统(5G)第3代合作伙伴计划(3GPP) 标准方案全局编码增益达0.5 dB，有效改善Polar编码跳频通信高可靠抗干扰传输性能。
- 信道编码 /
- 抗干扰 /
- Polar码 /
- 强化学习 /
- 可靠性能
Abstract: In order to improve the reliability and anti-jamming ability of information transmission for the Frequency-Hopping (FH) communication system, a Polar coding construction optimization method is proposed to adapt to the strong-jamming environment, which is based on a novel Polar coded slow FH communication system model. Firstly, the multi-objective reinforcement learning algorithm is designed for the hybrid channel containing normal state and jamming state, and then the information bit-channel sequence in the coding process is optimized. Consequently the error correction performance of the designed Polar codewords is improved. In addition, the complexity of algorithm is reduced by preprocessing the initialization and theoretically calculating the reward values. The simulation results show that the overall error performance of the proposed coding optimization method is better than those of conventional coding construction methods in the hybrid channel containing strong jamming. Compared with the 3rd Generation Partnership Project (3GPP) standard scheme in Fifth-Generation (5G) mobile communication systems, the obtained overall coding gain is up to 0.5 dB. Therefore the high-reliability and anti-jamming performance of Polar coded FH transmission is effectively improved.
- Channel coding /
- Anti-jamming /
- Polar codes /
- Reinforcement learning /
- Reliability performance

HTML全文

图 1 Polar编码跳频通信系统框图

下载: 全尺寸图片幻灯片

图 2 Polar编码的MDP构造模型

下载: 全尺寸图片幻灯片

图 3 N=128时不同信息位序列方案的BLER性能结果

下载: 全尺寸图片幻灯片

图 5 N=512时不同信息位序列方案的BLER性能结果

下载: 全尺寸图片幻灯片

图 4 N=256时不同信息位序列方案的BLER性能结果

下载: 全尺寸图片幻灯片

图 6 不同干扰因子时各信息位序列构造方案的BLER性能结果 $ {\rho _1} = 0.1,0.2,0.3 $

下载: 全尺寸图片幻灯片

图 7 SCL译码时不同信息位序列构造方案的BLER结果

下载: 全尺寸图片幻灯片

图 8 CRC辅助SCL译码时不同信息位序列构造方案的BLER结果

下载: 全尺寸图片幻灯片

图 9 与5G 3GPP标准信息位序列构造方案的BLER对比结果

下载: 全尺寸图片幻灯片

表 1 初始化预处理后的动作空间占比结果 $ \left| {{\mathbb{T}_{{\mathrm{act}}}}} \right|/K $

N	$ {\rho _j} $
N	0.1	0.2	0.3	0.4	0.6
256	0.05	0.09	0.12	0.18	0.25
512	0.03	0.05	0.08	0.10	0.16
1024	0.10	0.12	0.15	0.17	0.20

下载: 导出CSV

算法1　基于多目标强化学习的Polar编码优化构造算法
(1) 初始化设置Polar码长N, 信息比特长度 K, 干扰因子 $ \{ {\rho _j},1 \le j \le J - 1\} $；
(2) 对未干扰接收序列和对应干扰因子 ${\rho _j} $的S个不同干扰样式p_i的接收序列，分别重构信息位序列 $ \left\{{\mathcal{A}}_{N}^{\text{GA}},{\mathcal{A}}_{N,1}^{\text{GA}},{\mathcal{A}}_{N,2}^{\text{GA}},\cdots,{\mathcal{A}}_{N,S}^{\text{GA}}\right\} $；
(3) 确定初始动作空间 $ {\mathbb{T}}_{{\mathrm{act}}}=\left\{{\mathcal{A}}_{N}^{\text{GA}}\cup {\mathcal{A}}_{N,1}^{\text{GA}}\cup \cdots\cup {\mathcal{A}}_{N,S}^{\text{GA}}\right\}\Bigr\backslash \left\{{\mathcal{A}}_{N}^{\text{GA}}\cap {\mathcal{A}}_{N,1}^{\text{GA}}\cap \cdots\cap {\mathcal{A}}_{N,S}^{\text{GA}}\right\} $，初始状态的信息位序列　　 $ s:{\mathcal{A}}_{N,in}^{o}=\left\{{\mathcal{A}}_{N}^{\text{GA}}\cap {\mathcal{A}}_{N,1}^{\text{GA}}\cap \cdots\cap {\mathcal{A}}_{N,S}^{\text{GA}}\right\} $；若J > 2，则对每个 ${\rho _j} $获得的 ${\mathcal{A}}_{N,{\mathrm{in}}}^{o} $, ${\mathbb{T}_{{\mathrm{act}}}} $相互间取交集。设周期(episode)数的最大值为E;
(4) 随机初始化 $TQ(s,{a^N}) $;
(5) 对于每个周期e（ $1 \le e \le E $），重复下述(6)～(15)操作；
(6) 　　初始化状态 $s $;
(7) 　　对于每个周期的阶段k，重复下述操作；
(8) 　　选取动作 $a_k^N $，估计误块率值 $ {{\mathrm{bler}}_{j,k}} $，计算回报值 $r_{j,k}^N $，获得 $r_{1,k}^N,r_{2,k}^N,\cdots,s' $;
(9) 　　对于 $ j = 0,1,\cdots,J - 1 $，根据 $r_{1,k}^N,r_{2,k}^N,\cdots,s' $依次计算对应接收序列 ${c_j} $的Q值　　　　 ${Q_j}(s,{a^N}) = (1 - \alpha ){Q_j}(s,{a^N}) + \alpha ({r_j} + \mathop {\max }\limits_{{a^{N'}}} {Q_j}(s',{a^{N'}})) $;
(10) 　计算对应接收序列簇 $ c_k^N = \{ {c_0},{c_1},\cdots,{c_{J - 1}}\} $的综合Q值 $TQ(s,{a^N}) $;
(11) 　基于 $TQ(s,{a^N}) $确定动作 $ a_k^N $；
(12) 　更新 $\mathcal{A}_{N,k + 1}^o = a_k^N \cup \mathcal{A}_{N,k}^o $, $ {\mathbb{T}_{{\mathrm{act}}}} \leftarrow {\mathbb{T}_{{\mathrm{act}}}}\backslash a_k^N $;
(13) 　状态转移： $ s \leftarrow s' $;
(14) 　判断当前状态s是否截止，否，则转到第7步；是，则继续执行下一步；
(15) 判断是否满足e=E，否，转到第(5)步；是，继续执行下一步；
(16) 输出构造的最优信息位序列 $\mathcal{A}_{N,K}^o $。

下载: 导出CSV

表 2 混合信道中所提优化方案与对比方案的全局性能增益差 (dB)

码长N	全局增益差 $ {G_{{\text{dB}}}} $	$ {{\mathrm{BLER}}^{{\text{th}}}} $
码长N	全局增益差 $ {G_{{\text{dB}}}} $	$ {10^{ - 2}} $	$ {10^{ - 3}} $	$ {10^{ - 4}} $
128	$ \displaystyle\sum\nolimits_j^{J - 1} {{G_{{\text{dB}}}}{{[\mathcal{A}_{N,K}^o,\mathcal{A}_N^{{\text{GA}}}]}_{{\rho _j}}}} $	0.27	0.28	0.18
128	$ \displaystyle\sum\nolimits_j^{J - 1} {{G_{{\text{dB}}}}{{[\mathcal{A}_{N,K}^o,{\mathcal{A}^{{\text{PW}}}}]}_{{\rho _j}}}} $	0.08	0.13	0.27
256	$ \displaystyle\sum\nolimits_j^{J - 1} {{G_{{\text{dB}}}}{{[\mathcal{A}_{N,K}^o,\mathcal{A}_N^{{\text{GA}}}]}_{{\rho _j}}}} $	0.26	0.28	0.26
256	$ \displaystyle\sum\nolimits_j^{J - 1} {{G_{{\text{dB}}}}{{[\mathcal{A}_{N,K}^o,{\mathcal{A}^{{\text{PW}}}}]}_{{\rho _j}}}} $	–0.34	–0.16	0.13
512	$ \displaystyle\sum\nolimits_j^{J - 1} {{G_{{\text{dB}}}}{{[\mathcal{A}_{N,K}^o,\mathcal{A}_N^{{\text{GA}}}]}_{{\rho _j}}}} $	0.32	0.35	0.37
512	$ \displaystyle\sum\nolimits_j^{J - 1} {{G_{{\text{dB}}}}{{[\mathcal{A}_{N,K}^o,{\mathcal{A}^{{\text{PW}}}}]}_{{\rho _j}}}} $	–0.17	0.08	0.45

下载: 导出CSV

参考文献(18)

[1]	ARIKAN E. Channel polarization: A method for constructing capacity-achieving codes for symmetric binary-input memoryless channels[J]. IEEE Transactions on Information Theory, 2009, 55(7): 3051–3073. doi: 10.1109/TIT.2009.2021379
[2]	ETSI. ETSI TS 38 212 V15.2. 0 5G; NR; Multiplexing and channel coding[S]. 2018.
[3]	NIU Kai, ZHANG Ping, DAI Jincheng, et al. A golden decade of polar codes: From basic principle to 5G applications[J]. China Communications, 2023, 20(2): 94–121. doi: 10.23919/JCC.2023.02.015
[4]	白宝明, 马啸, 陈文, 等. 面向B5G/6G的信息传输与接入技术专题序言[J]. 西安电子科技大学学报, 2020, 47(6): 1–4. doi: 10.19665/j.issn1001-2400.2020.06.001 BAI Baoming, MA Xiao, CHEN Wen, et al. Editorial: Introduction to the special issue on information transmission and access technologies for B5G/6G[J]. Journal of Xidian University, 2020, 47(6): 1–4. doi: 10.19665/j.issn1001-2400.2020.06.001
[5]	DONG Yanfei, DAI Jincheng, NIU Kai, et al. Joint source-channel coding for 6G communications[J]. China Communications, 2022, 19(3): 101–115. doi: 10.23919/JCC.2022.03.007
[6]	魏浩, 张梦洁, 王东明. 6G极化码低时延译码方案[J]. 移动通信, 2022, 46(6): 64–71. doi: 10.3969/j.issn.1006-1010.2022.06.010 WEI Hao, ZHANG Mengjie, and WANG Dongming. Low-latency decoding algorithm of polar codes for 6G wireless systems[J]. Mobile Communications, 2022, 46(6): 64–71. doi: 10.3969/j.issn.1006-1010.2022.06.010
[7]	王任之, 潘克刚, 赵瑞祥. 跳频抗干扰通信系统中LDPC码的编码优化设计[J]. 系统工程与电子技术, 2022, 44(11): 3548–3555. doi: 10.12305/j.issn.1001-506X.2022.11.31 WANG Renzhi, PAN Kegang, and ZHAO Ruixiang. Optimal design of LDPC Codes in frequency hopping anti-jamming communication system[J]. Systems Engineering and Electronics, 2022, 44(11): 3548–3555. doi: 10.12305/j.issn.1001-506X.2022.11.31
[8]	孙康宁, 马林华, 茹乐, 等. 混合信道下LDPC码稳定条件分析及度序列优化[J]. 通信学报, 2016, 37(9): 168–174. doi: 10.11959/j.issn.1000-436x.2016188 SUN Kangning, MA Linhua, RU Le, et al. Analysis of stability condition for LDPC codes and optimizing degree sequences over mixed channel[J]. Journal on Communications, 2016, 37(9): 168–174. doi: 10.11959/j.issn.1000-436x.2016188
[9]	刘士平, 马林华, 孙康宁, 等. 阻塞式干扰环境下LDPC编码跳频通信优化方案[J]. 火力与指挥控制, 2019, 44(2): 32–36,40. doi: 10.3969/j.issn.1002-0640.2019.02.007 LIU Shiping, MA Linhua, SUN Kangning, et al. Optimizing of LDPC coded frequency-hopping communication over blocking interference[J]. Fire Control & Command Control, 2019, 44(2): 32–36,40. doi: 10.3969/j.issn.1002-0640.2019.02.007
[10]	MORI R and TANAKA T. Performance of polar codes with the construction using density evolution[J]. IEEE Communications Letters, 2009, 13(7): 519–521. doi: 10.1109/LCOMM.2009.090428
[11]	TRIFONOV P. Efficient design and decoding of Polar codes[J]. IEEE Transactions on Communications, 2012, 60(11): 3221–3227. doi: 10.1109/TCOMM.2012.081512.110872
[12]	HE Gaoning, BELFIORE J C, LAND I, et al. Beta-expansion: A theoretical framework for fast and recursive construction of Polar codes[C]. Proceedings of 2017 IEEE Global Communications Conference, Singapore, 2017: 1–6.
[13]	LI Jianxiu and CHENG Wenchi. Stacked denoising autoencoder enhanced Polar codes over Rayleigh fading channels[J]. IEEE Wireless Communications Letters, 2020, 9(3): 354–357. doi: 10.1109/LWC.2019.2954907
[14]	LIAO Yun, HASHEMI S A, CIOFFI J M, et al. Construction of polar codes with reinforcement learning[J]. IEEE Transactions on Communications, 2022, 70(1): 185–198. doi: 10.1109/TCOMM.2021.3120274
[15]	TENG C F and WU A Y A. Convolutional neural network-aided tree-based bit-flipping framework for polar decoder using imitation learning[J]. IEEE Transactions on Signal Processing, 2021, 69: 300–313. doi: 10.1109/TSP.2020.3040897
[16]	LU Yang, ZHAO Mingmin, LEI Ming, et al. Deep learning aided SCL decoding of polar codes with shifted-pruning[J]. China Communications, 2023, 20(1): 153–170. doi: 10.23919/JCC.2023.01.013
[17]	KORADA S B and URBANKE R L. Polar codes are optimal for lossy source coding[J]. IEEE Transactions on Information Theory, 2010, 56(4): 1751–1768. doi: 10.1109/TIT.2010.2040961
[18]	LIU Chunming, XU Xin, and HU Dewen. Multiobjective reinforcement learning: A comprehensive overview[J]. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2015, 45(3): 385–398.