Intelligent Resource Allocation Algorithm Based on Outdated CSI for Multi-Node URLLC
-
摘要: 工业物联网(IIoT)往往具有节点数量多、通信可靠性与时延要求严格等特点。大规模节点导致信道状态信息(CSI)反馈开销大,难以实时获取。然而,基于过期CSI的资源分配,往往需要以冗余的发射功率、码长等资源配置来保障通信的可靠性,严重制约了系统能效。针对该问题,本文以最大化能效为目标,提出一种基于连续凸近似(SCA)辅助深度强化学习(DRL)的功率码长联合分配算法。首先,基于SCA算法对功率和码长进行预分配,获得具有物理可解释性的基准解;进而,以基准解为先验信息,设计基于双延迟深度确定性策略梯度(TD3)的DRL算法进行增量式优化。仿真表明,所提算法能有效应对信道动态变化,显著提升系统能效。Abstract:
Objective Ultra-Reliable and Low-Latency Communications (URLLC) have found widespread applications in Industrial Internet-of-Things (IIoT) systems. However, in mobile operation scenarios such as transportation and inspection, the acquisition of instantaneous Channel State Information (CSI) is often impractical due to feedback overhead, forcing resource allocation decisions to be made based on outdated CSI. This mismatch significantly limits the achievable energy efficiency of the system. Traditional convex optimization methods have difficulty addressing such challenges, while classical Deep Reinforcement Learning (DRL) algorithms also exhibit inherent limitations in terms of convergence stability and policy performance when confronted with the stringent Quality-of-Service (QoS) constraints in URLLC. Motivated by these challenges, considering a multi-user URLLC system operating under outdated CSI in dynamic scenarios, this paper formulates an energy efficiency maximization problem while guaranteeing the communication latency and reliability requirements, and aims to design an efficient and stable algorithm for joint power and blocklength allocation. Methods To achieve the above objective, this paper proposes a Successive Convex Approximation (SCA)–assisted DRL framework for energy efficiency maximization under outdated CSI. Specifically, a SCA-based algorithm is first developed to derive a pre-allocation of transmit power and blocklength, yielding a feasible and physically interpretable yet relatively conservative baseline solution. Building upon this baseline, a Twin Delayed Deep Deterministic Policy Gradient (TD3) algorithm is employed to perform incremental refinement through interaction with the dynamic environment, thereby alleviating the conservative nature of SCA. Meanwhile, the SCA solution is incorporated as prior knowledge together with user location information into the state representation, which effectively narrows the policy search space and enables the DRL agent to better capture large-scale channel characteristics and system dynamics under outdated CSI, thereby enhancing the learning efficiency and stability. Results and Discussions The effectiveness of the proposed method is validated through the following simulation results. In the simulation, the proposed algorithm is evaluated against SCA, TD3 without SCA guidance, and TD3 without user location information. Simulation results demonstrate that the proposed method significantly outperforms all benchmark schemes in terms of convergence stability and system energy efficiency. During the training phase ( Fig. 3 ), the average reward of the proposed algorithm increases steadily and converges stably, whereas removing location information leads to low and highly fluctuating rewards, and removing SCA guidance results in convergence to a much lower reward level, highlighting the importance of both prior guidance and location-aware state representation. Besides, during the actual operation stage of the system, the proposed algorithm achieves high and stable energy efficiency (Fig. 4 ), significantly outperforming comparative algorithms. Under outdated CSI, DRL-based methods outperform conservative optimization when transmission is successful, while the absence of location information or SCA guidance significantly degrades energy efficiency or increases transmission failures, verifying the two factors' effectiveness in improving energy efficiency and ensuring strategy validity. The simulation also examined the impact of key system parameters on energy efficiency. For basic resource parameters such as blocklength (Fig. 5 ) or power (Fig. 6 ), appropriately increasing their budget can help improve system energy efficiency. For parameters about reliability (Fig. 7 ), in order to avoid waste of resources, they should be reasonably set according to business requirements. Finally, the simulation of the average energy efficiency varying with the number of nodes and the number of network neurons provides certain reference basis for the configuration of the algorithm structure and the design of the network scale (Fig. 8 ).Conclusions In conclusion, this paper addresses the challenge of energy-efficient resource allocation for multi-user URLLC systems operating under outdated CSI by integrating SCA with DRL. That is, a TD3-based DRL approach is enhanced by introducing a SCA reference solution as prior guidance and incorporating user location information into the state representation. Such an optimization–learning dual-driven solution framework combines the interpretability and feasibility of model-based optimization with the adaptivity and expressive power of data-driven learning. The effectiveness of the proposed method is evaluated through simulations: (1) The proposed method achieves higher energy efficiency than pure optimization and conventional TD3 while satisfying URLLC latency and reliability constraints; (2) The SCA reference improves the stability and effectiveness of the strategy under outdated CSI; (3) Incorporating user location information enables more efficient decision-making. However, this work focuses on a single-cell multi-user scenario, and practical issues such as multi-cell interference, cooperative multi-base-station scheduling, and more complex mobility patterns are not considered. Future work will extend the proposed framework to more realistic multi-cell and multi-agent scenarios and investigate its applicability under more severe CSI imperfections. -
1 资源预分配算法流程
初始化:初始化可行局部点$ ({\mathbf{m}}^{0},{\mathbf{p}}^{0}) $,$ \tau =0 $ 迭代: (1) 围绕函数$ {F}^{\tau }(\mathbf{m},\mathbf{p}) $构建局部问题(LP) (2) 通过椭球法求解凸问题(LP):初始化椭球,每次迭代中,构
造分离超平面,更新椭球,直至误差小于阈值,得到局部最优解
$ ({\mathbf{m}}^{\ast },{\mathbf{p}}^{\ast }) $(3) If 目标函数的性能提升小于阈值:结束迭代,返回
$ (\left\lceil {\mathbf{m}}^{\ast }\right\rceil ,{\mathbf{p}}^{\ast }) $Else:$ ({\mathbf{m}}^{\tau +1},{\mathbf{p}}^{\tau +1})=({\mathbf{m}}^{\ast },{\mathbf{p}}^{\ast }) $,$ \tau =\tau +1 $,回到(1) 2 基于SCA辅助DRL的功率码长联合分配算法流程
初始化Actor当前网络$ \mu $和两个Critic当前网络$ {Q}_{1} $、$ {Q}_{2} $的网络参数$ {\mathbf{\theta }}_{\mu } $、$ {\mathbf{\theta }}_{1} $、$ {\mathbf{\theta }}_{2} $,以及对应的目标网络参数$ {\mathbf{\theta }}_{{{\mu }_{\text{target}}}} $、$ {\mathbf{\theta }}_{\text{target},1} $、$ {\mathbf{\theta }}_{\text{target},2} $ 初始化经验回放池$ \mathcal{R} $,学习率$ l $、折扣因子$ \delta $、探索噪声$ {\chi }_{t} $、软更新系数$ \xi $和延迟更新间隔$ \kappa $ (1) For 每一训练回合 do (2) 重置环境,获取初始状态$ {\mathbf{s}}_{0} $ (3) For 时间步$ t=1\rightarrow {T}_{0} $ do (4) 根据算法1预分配所有节点的功率和码长,得到$ \left({\mathbf{\tilde{m}}}_{t},{\mathbf{\tilde{p}}}_{t}\right) $ (5) 根据Actor网络和探索噪声$ {\chi }_{t} $生成动作$ {\mathbf{a}}_{t}=\mu \left({\mathbf{s}}_{t}\right)+{\chi }_{t} $ (6) 环境执行$ {\mathbf{a}}_{t} $,反馈奖励$ {r}_{t} $,进入下一个状态$ {\mathbf{s}}_{t+1} $,存储 $ \left({\mathbf{s}}_{t},{\mathbf{a}}_{t},{r}_{t},{\mathbf{s}}_{t+1}\right) $到$ \mathcal{R} $中 (7) 从$ \mathcal{R} $中随机采样一批经验样本训练Actor当前网络和Critic当前网络 (8) If $ t\text{mod}\kappa =0 $then 更新Actor当前网络$ \mu $,软更新目标网络参数: $ {\mathbf{\theta }}_{{{\mu }_{\text{target}}}}\leftarrow \xi {\mathbf{\theta }}_{\mu }+\left(1-\xi \right){\mathbf{\theta }}_{{{\mu }_{\text{target}}}} $,$ {\mathbf{\theta }}_{\text{target},i}\leftarrow \xi {\mathbf{\theta }}_{i}+\left(1-\xi \right){\mathbf{\theta }}_{\text{target},i} $,$ i=1,2 $ (9) 输出:收敛后的功率码长联合分配策略 表 1 通信系统仿真参数
参数名称 符号 数值 噪声谱密度 $ \sigma _{k}^{2} $ $ 1\times {10}^{-7} $ 数据包大小 $ {D}_{k} $ $ 100\text{bits} $ 每个调制符号传输时长 $ T $ $ 1\text{ms} $ 莱斯因子 $ {K}_{\text{R}} $ $ 5 $ 莱斯信道参考距离 $ {d}_{0} $ $ 1.0\text{m} $ 莱斯信道路径损耗指数 $ a $ $ 2.2 $ 莱斯信道参考信道增益 $ {g}_{0} $ $ 1\times {10}^{-3} $ 表 2 TD3网络训练超参数
参数名称 符号 数值 学习率 $ l $ $ 0.0001 $ 折扣因子 $ \delta $ $ 0.99 $ 探索噪声的标准差 $ \sigma $ $ 0.2 $ 软更新系数 $ \xi $ $ 0.005 $ 延迟更新间隔 $ \kappa $ $ 2 $ 批量大小 $ B $ $ 256 $ 回合最大时间步 $ {T}_{0} $ $ 1000 $ 缩放因子 $ \alpha $ $ 100 $ 正常数偏置项 $ \beta $ $ 100 $ -
[1] 张明强, 马晓聪, 杨雅娟, 等. 工业物联网智能感知-传输-控制融合: 关键技术与未来展望[J]. 电子与信息学报, 2025, 47(10): 3410–3425. doi: 10.11999/JEIT250305.ZHANG Mingqiang, MA Xiaocong, YANG Yajuan, et al. Integrating intelligent sensing, transmission, and control for industrial IoT networks: Key technologies and future directions[J]. Journal of Electronics & Information Technology, 2025, 47(10): 3410–3425. doi: 10.11999/JEIT250305. [2] TALLAT R, HAWBANI A, WANG Xingfu, et al. Navigating industry 5.0: A survey of key enabling technologies, trends, challenges, and opportunities[J]. IEEE Communications Surveys & Tutorials, 2024, 26(2): 1080–1126. doi: 10.1109/COMST.2023.3329472. [3] HAQUE E, TARIQ F, KHANDAKER M R A, et al. A comprehensive survey of 5G URLLC and challenges in the 6G era[EB/OL]. https://arxiv.org/abs/2508.20205, 2025. [4] 胡钰林, 喻鑫岚, 高伟, 等. 低时延工业物联网中移动边缘计算的安全性与可靠性联合优化[J]. 电子与信息学报, 2025, 47(10): 3492–3504. doi: 10.11999/JEIT250262.HU Yulin, YU Xinlan, GAO Wei, et al. Security and reliability-optimal offloading for mobile edge computing in low-latency industrial IoT[J]. Journal of Electronics & Information Technology, 2025, 47(10): 3492–3504. doi: 10.11999/JEIT250262. [5] LIAQ M, SHARIF S, ZEADALLY S, et al. Utilization of machine learning in future wireless networks for resource optimization: A survey[J]. Ad Hoc Networks, 2025, 178: 103983. doi: 10.1016/j.adhoc.2025.103983. [6] DONG Yun, ZHANG Liyuan, LIN Zijian, et al. Multiuser covert terahertz communication with outdated CSI and data exception[J]. Transactions on Emerging Telecommunications Technologies, 2025, 36(7): e70184. [7] WAN Xiaoyu, LI Ershun, WANG Zhengqiang, et al. Energy-efficient resource allocation for multicarrier NOMA systems with imperfect CSI[C]. 2021 IEEE 4th International Conference on Electronic Information and Communication Technology, Xi’an, China, 2021: 823–827. doi: 10.1109/ICEICT53123.2021.9531322. [8] HUANG Jie, YU Tao, YANG Fan, et al. AoI-aware resource allocation with interference avoidance for ultradense industrial internet of things networks[J]. IEEE Internet of Things Journal, 2024, 11(17): 28787–28797. doi: 10.1109/JIOT.2024.3403849. [9] TEJA P R, DUBEY K, and DUBEY R. 2DRL: Cognitive D2D control under imperfect CSI via adaptive deep reinforcement learning[J]. International Journal of Networked and Distributed Computing, 2026, 14(1): 6. doi: 10.1007/s44227-025-00081-0. [10] POLYANSKIY Y, POOR H V, and VERDU S. Channel coding rate in the finite blocklength regime[J]. IEEE Transactions on Information Theory, 2010, 56(5): 2307–2359. doi: 10.1109/TIT.2010.2043769. [11] 胡钰林, 肖志成, 徐浩. 有限码长域下针对多用户大规模MIMO系统速率优化的高效功率分配算法[J]. 电子与信息学报, 2025, 47(1): 35–47. doi: 10.11999/JEIT240241.HU Yulin, XIAO Zhicheng, and XU Hao. Efficient power allocation algorithm for throughput optimization of multi-user massive MIMO systems in finite blocklength regime[J]. Journal of Electronics & Information Technology, 2025, 47(1): 35–47. doi: 10.11999/JEIT240241. [12] MUGISHA R, MAHMOOD A, ABEDIN S F, et al. Joint power and blocklength allocation for energy-efficient ultra- reliable and low- latency communications[C]. 2021 17th International Symposium on Wireless Communication Systems, Berlin, Germany, 2021: 1–6. doi: 10.1109/ISWCS49558.2021.9562249. [13] PRADHAN A, DAS S, and PIRAN J. Blocklength optimization and power allocation for energy-efficient and secure URLLC in industrial IoT[J]. IEEE Internet of Things Journal, 2024, 11(6): 9420–9431. doi: 10.1109/JIOT.2023.3324379. [14] SHI Ningzhe, ZHANG Yu, and ZHOU YiqingSHI Ningzhe, ZHANG Yu, and ZHOU Yiqing. Deep reinforcement learning based subchannel selection and power allocation in wireless networks with imperfect CSI[C]. 2023 IEEE 97th Vehicular Technology Conference2023 IEEE 97th Vehicular Technology Conference, Florence, ItalyFlorence, Italy, 2023: 1–5. doi: 10.1109/VTC2023-Spring57618.2023.10199481. [15] FUJIMOTO S, HOOF H, and MEGER D. Addressing function approximation error in actor-critic methods[C]. Proceedings of the 35th International Conference on Machine Learning, Stockholm, Sweden, 2018: 1587–1596. [16] GAO Wei, ZHENG P, HU Yulin, et al. A novel link adaptation approach for URLLC: A DRL-based method with OLLA[C]. 2024 IEEE Wireless Communications and Networking Conference, Dubai, United Arab Emirates, 2024: 1–6. doi: 10.1109/WCNC57260.2024.10570645. [17] SAMPAIO L and LANDAU L T N. Spatially and temporally correlated channel estimation and detection for comparator network-aided MIMO receivers with 1-bit ADCs[J]. EURASIP Journal on Advances in Signal Processing, 2025, 2025(1): 34. doi: 10.1186/s13634-025-01238-3. [18] ZHAO Yizhen, GAO Wei, ZHU Yao, et al. ZHAO Yizhen, GAO Wei, ZHU Yao, et al. Energy efficiency maximization for multi-node IoT networks operating with finite blocklength codes[C]. 2024 19th International Symposium on Wireless Communication Systems, Rio de Janeiro, Brazil, 2024: 1–6. doi: 10.1109/ISWCS61526.2024.10639049. [19] GIWA O, SHOCK J, TOIT J D, et al. Optimisation of resource allocation in heterogeneous wireless networks using deep reinforcement learning[EB/OL]. https://arxiv.org/abs/2509.25284, 2026. -
下载:
下载: