Robust Resource Optimization in Integrated Sensing, Communication, and Computing Networks Based on Soft Actor-Critic
-
摘要: 通感算融合是6G的热点研究方向。为了解决复杂场景下通信-感知-计算模式的用户能耗大、计算不确定等问题,该文设计一种稳健的通感算融合网络资源分配与决策优化方案。首先,由于任务复杂度的不可预测,构建一个稳健的计算资源分配问题以优化卸载决策的不确定性。其次,在满足用户功耗、处理时间、雷达估计信息率等条件下,联合优化任务卸载比例、波束赋形和资源分配,建立用户总能耗最小化问题。由于该优化问题是多变量耦合且非凸的,将其建模为一个马尔可夫决策过程,提出一种基于柔性演员-评论家(SAC)优化算法。仿真结果表明,该算法在网络训练时更加稳定,能有效增强计算稳健性,与近端策略优化算法和优势动作评论算法相比,所提SAC算法在用户能耗方面分别减少了9.57%和40.72%。此外,用户数越多,能耗减少越显著。Abstract:
Objective Traditional approaches typically adopt a disjoint design that improves specific performance aspects under particular scenarios but often proves inadequate for addressing complex tasks in dynamic environments. Challenges such as real-time task offloading, efficient resource scheduling, and the simultaneous optimization of sensing, communication, and computing performance remain significant. The Integrated Sensing, Communication, and Computing (ISCC) architecture has been proposed to address these issues. In complex scenarios, the diversity of task types and varying requirements lead to inflexible offloading policies, limiting the system’s ability to adapt to real-time network changes. Moreover, computational uncertainty can undermine the robustness of resource scheduling, potentially resulting in performance degradation or task failure. Effectively addressing challenges like high user energy consumption and computational uncertainty while maintaining service quality is crucial for optimizing future network nodes. As network environments grow increasingly complex and user demands for high performance, low latency, and robust reliability rise, the optimization of resource efficiency and the achievement of mutual benefit across sensing, communication, and computing functions become urgent and critical. To meet this challenge, it is essential to advance the system towards higher intelligence and multi-dimensional connectivity. Furthermore, research on robust offloading in ISCC networks remains limited and warrants further investigation. Methods To address high user energy consumption and computational uncertainty in ISCC networks under complex scenarios, a robust resource allocation and decision optimization scheme is proposed. The goal is to minimize the total energy consumption of users. The proposed scheme takes into account common constraints and computational uncertainty commonly encountered in practical applications, offering a viable optimization approach for ISCC network design. First, to tackle the challenge of accurately predicting task complexity, potential biases arising from resource allocation and processing estimations are analyzed. These biases reflect real-world unpredictability, where task size can be measured but completion time remains uncertain, potentially leading to resource waste or performance degradation. To mitigate this, a robust computational resource allocation problem is formulated to manage the uncertainty caused by task offloading effectively. Second, the problem of minimizing users’ total energy is established by jointly optimizing task offloading ratios, beamforming, and resource allocation, subject to constraints such as power consumption, processing time, and radar estimation information rate. However, due to the multi-variable, non-convex, and NP-hard nature of this optimization problem, traditional methods fail to provide efficient solutions. To address this, a Markov decision process is modeled, and an optimization algorithm based on Soft Actor-Critic (SAC) is proposed. Results and Discussions The simulation results demonstrate that the proposed SAC-based algorithm outperforms existing methods in terms of performance and flexibility in dynamic and complex scenarios. Specifically, the learning rate affects the convergence speed of the algorithm, but its impact on final performance is minimal ( Fig. 3 ). Compared to the Proximal Policy Optimization (PPO) and Advantage Actor-Critic (A2C) algorithms, the proposed algorithm achieves faster training speeds. Thanks to its flexible and unique design, the proposed algorithm exhibits stronger exploration capabilities and remains more stable during training (Fig. 4 ). The robust design enhances adaptability, resulting in higher overall reward values (Fig. 5 ). In terms of total user energy consumption, the proposed algorithm reduces energy use by approximately 9.57% compared to PPO and by 40.72% compared to A2C. As the number of users increases and more users access the network, signal interference intensifies, transmission rates decrease, and task offloading costs rise. In such scenarios, the proposed algorithm shows greater flexibility in policy adjustment, maintaining energy consumption at a relatively low level, outperforming both PPO and A2C. This advantage becomes more pronounced as the number of users grows or load pressure increases (Fig. 6 ). Overall, the proposed algorithm offers a robust and efficient solution for resource allocation and optimization in dynamic and complex environments, demonstrating exceptional adaptability and reliability in multi-user and multi-task scenarios. These results not only highlight the superior performance of the SAC algorithm but also highlight its potential in addressing multi-variable, non-convex problems.Conclusions This paper presents an optimization algorithm based on SAC, which not only achieves outstanding performance in terms of energy consumption, latency, and task offloading efficiency but also demonstrates excellent scalability and adaptability in multi-user, multi-task, and complex scenarios. A robust computational resource allocation scheme is proposed to address the uncertainty in offloading decisions. Simulation results show that the proposed algorithm can adapt to complex and dynamic network environments through flexible policy decisions, providing both theoretical support and a technical reference for further research on ISCC networks in such scenarios. Future research could explore incorporating multi-base station collaboration to enhance the robustness of ISCC networks, enabling them to better handle even more complex network environments. -
1 基于SAC的资源优化算法
步骤1 初始化$ \phi $, $ {\xi } $, 经验池 步骤2 对每个训练周期执行: 步骤3 初始化用户坐标$ ({x_k}{\text{,}}{y_k}{\text{)}} $和任务类型z 步骤4 对每个环境交互步骤执行: 步骤5 获取当前环境状态$ {{s}_{n}} $ 步骤6 根据当前策略$ {{\pi }^*} $选择动作$ {{a}_{n}} $ 步骤7 执行动作$ {{a}_{n}} $ 步骤8 获取下一环境状态$ {{s}_{{n}{\text{+1}}}} $ 步骤9 计算回报$ {{r}_n} $ 步骤10 将经验元组$ {\text{(}}{{s}_{n}}{\text{,}}{{a}_{n}}{\text{,}}{{r}_n}{\text{,}}{{s}_{{n}{\text{+1}}}}{\text{)}} $存入经验池中 步骤11 对每个梯度更新步骤执行: 步骤12 从经验池随机采样小批次样本 步骤13 计算损失函数$ {{L}_{\pi }}{\text{(}}{\phi }{\text{)}} $, $ {{L}_{Q}}{\text{(}}{{\xi }_{i}}{\text{)}} $和$ {L}{\text{(}}{\chi }{\text{)}} $ 步骤14 更新参数$ \phi $, $ {{\xi }_{i}} $, $ {\xi }'_{i} $和$ {\chi } $ 表 1 参数设置
参数 数值 时隙$ {\delta _n} $(s) $ 1.0 $ 最小雷达估计信息率$ R_{{\text{rad}}}^{\min } $(dB) $ {10^3} $ 用户最大发射功率$ P_k^{\max } $(W) $ {\text{0}}{\text{.5}} $ 用户最大计算频率$ {f}_k^{{\text{max}}} $(GHz) $ {\text{1}}{\text{.0}} $ BS最大计算频率$ {f}_{{\text{ec}}}^{{\text{max}}} $(GHz) $ {\text{20}}{\text{.0}} $ 带宽B (MHz) $ {\text{20}} $ 雷达脉冲时长$ {\mu } $(s) $ {\text{2}} \times {\text{1}}{{\text{0}}^{{{ - 5}}}} $ CPU有效电容系数$ {\varepsilon } $ $ {\text{1}}{{\text{0}}^{{{ - 27}}}} $ 雷达波形功率谱密度常数$ \eta $ $ \pi {\text{/}}\sqrt {\text{3}} $ 雷达占空因子$ {\nu } $ $ {\text{0}}{\text{.01}} $ 误差界限预定阈值$ {\varepsilon _z} $ $ {\text{55}} $ -
[1] TAN D K P, HE Jia, LI Yanchun, et al. Integrated sensing and communication in 6G: Motivations, use cases, requirements, challenges and future directions[C]. 1st IEEE International Online Symposium on Joint Communications & Sensing, Dresden, Germany, 2021: 1–6. doi: 10.1109/JCS52304.2021.9376324. [2] CHENG Xiang, DUAN Dongliang, GAO Shijian, et al. Integrated sensing and communications (ISAC) for vehicular communication networks (VCN)[J]. IEEE Internet of Things Journal, 2022, 9(23): 23441–23451. doi: 10.1109/JIOT.2022.3191386. [3] 袁培燕, 邵赛珂, 魏然, 等. 基于时延和能耗约束的感知数据协作卸载策略研究[J]. 物联网学报, 2023, 7(1): 109–117. doi: 10.11959/j.issn.2096-3750.2023.00324.YUAN Peiyan, SHAO Saike, WEI Ran, et al. Research on the cooperative offloading strategy of sensory data based on delay and energy constraints[J]. Chinese Journal on Internet of Things, 2023, 7(1): 109–117. doi: 10.11959/j.issn.2096-3750.2023.00324. [4] 鲜永菊, 韩瑞寅, 左维昊, 等. 移动性感知下基于负载均衡的任务迁移方案[J]. 电讯技术, 2024, 64(3): 333–342. doi: 10.20079/j.issn.1001-893x.221121002.XIAN Yongju, HAN Ruiyin, ZUO Weihao, et al. A task migration scheme based on load balancing under mobility aware[J]. Telecommunication Engineering, 2024, 64(3): 333–342. doi: 10.20079/j.issn.1001-893x.221121002. [5] WANG Zhaolin, MU Xidong, LIU Yuanwei, et al. NOMA-aided joint communication, sensing, and multi-tier computing systems[J]. IEEE Journal on Selected Areas in Communications, 2023, 41(3): 574–588. doi: 10.1109/JSAC.2022.3229447. [6] ZHANG Wenqian, ZHANG Guanglin, and MAO Shiwen. Joint parallel offloading and load balancing for cooperative-MEC systems with delay constraints[J]. IEEE Transactions on Vehicular Technology, 2022, 71(4): 4249–4263. doi: 10.1109/TVT.2022.3143425. [7] XU Yu, ZHANG Tiankui, LOO J, et al. Completion time minimization for UAV-assisted mobile-edge computing systems[J]. IEEE Transactions on Vehicular Technology, 2021, 70(11): 12253–12259. doi: 10.1109/TVT.2021.3112853. [8] JEONG S, SIMEONE O, and KANG J. Mobile edge computing via a UAV-mounted cloudlet: Optimization of bit allocation and path planning[J]. IEEE Transactions on Vehicular Technology, 2018, 67(3): 2049–2063. doi: 10.1109/TVT.2017.2706308. [9] CHEN Yi, CHANG Zheng, MIN Geyong, et al. Joint optimization of sensing and computation for status update in mobile edge computing systems[J]. IEEE Transactions on Wireless Communications, 2023, 22(11): 8230–8243. doi: 10.1109/TWC.2023.3261338. [10] ZHANG Liang and ANSARI N. Latency-aware IoT service provisioning in UAV-aided mobile-edge computing networks[J]. IEEE Internet of Things Journal, 2020, 7(10): 10573–10580. doi: 10.1109/JIOT.2020.3005117. [11] HEIDARPOUR A R, HEIDARPOUR M R, ARDAKANI M, et al. Soft actor–critic-based computation offloading in multiuser MEC-enabled IoT—A lifetime maximization perspective[J]. IEEE Internet of Things Journal, 2023, 10(20): 17571–17584. doi: 10.1109/JIOT.2023.3277753. [12] ZHAO Lindong, WU Dan, ZHOU Liang, et al. Radio resource allocation for integrated sensing, communication, and computation networks[J]. IEEE Transactions on Wireless Communications, 2022, 21(10): 8675–8687. doi: 10.1109/TWC.2022.3168348. [13] HE Yinghui, YU Guanding, CAI Yunlong, et al. Integrated sensing, computation, and communication: System framework and performance optimization[J]. IEEE Transactions on Wireless Communications, 2024, 23(2): 1114–1128. doi: 10.1109/TWC.2023.3285869. [14] TANG Ming and WONG V W S. Deep reinforcement learning for task offloading in mobile edge computing systems[J]. IEEE Transactions on Mobile Computing, 2022, 21(6): 1985–1997. doi: 10.1109/TMC.2020.3036871. [15] 任之初, 靳亚盛, 潘存华. STAR-RIS辅助通感算一体化系统波束成形设计[J]. 移动通信, 2024, 48(4): 66–72. doi: 10.3969/j.issn.1006-1010.20240227-0001.REN Zhichu, JIN Yasheng, and PAN Cunhua. Beamforming design of STAR-RIS-assisted integrated sensing, communication and computation system[J]. Mobile Communications, 2024, 48(4): 66–72. doi: 10.3969/j.issn.1006-1010.20240227-0001. [16] ESHRAGHI N and LIANG Ben. Joint offloading decision and resource allocation with uncertain task computing requirement[C]. IEEE Conference on Computer Communications, Paris, France, 2019: 1414–1422. doi: 10.1109/INFOCOM.2019.8737559. [17] SAMIR M, SHARAFEDDINE S, ASSI C M, et al. UAV trajectory planning for data collection from time-constrained IoT devices[J]. IEEE Transactions on Wireless Communications, 2020, 19(1): 34–46. doi: 10.1109/TWC.2019.2940447. [18] QI Qiao, CHEN Xiaoming, KHALILI A, et al. Integrating sensing, computing, and communication in 6G wireless networks: Design and optimization[J]. IEEE Transactions on Communications, 2022, 70(9): 6212–6227. doi: 10.1109/TCOMM.2022.3190363. [19] SCHULMAN J, WOLSKI F, DHARIWAL P, et al. Proximal policy optimization algorithms[J]. arXiv preprint, arXiv: 1707.06347, 2017. doi: 10.48550/arXiv.1707.06347. [20] MNIH V, BADIA A P, MIRZA M, et al. Asynchronous methods for deep reinforcement learning[C]. Proceedings of the 33rd International Conference on Machine Learning, New York, USA, 2016: 1928–1937. -