Collaborative Air-Ground Computation Offloading and Resource Optimization in Dynamic Vehicular Network Scenarios
-
摘要: 针对移动用户数量迅猛增长和地面基础设施分布稀疏所带来的挑战,该文提出一种能量收集辅助的空地协同计算卸载架构。该架构充分利用无人机(UAVs)的灵活机动性和路侧单元(RSUs)及基站(BS)的强大算力,实现了任务计算的动态实时分发。特别地,无人机通过能量收集来维持其持续运行和稳定的计算性能。考虑到无人机与地面车辆的高动态性、车辆计算任务的随机性,以及信道模型的时变性,提出一个能耗受限的长期优化问题,旨在从全局角度有效降低整个系统的平均时延。为了解决这一复杂的混合整数规划(MIP)问题,提出一种基于改进演员-评论家(Actor-Critic)强化学习算法的计算卸载策略(IACA)。该算法运用李雅普诺夫优化技术,将长期系统时延优化问题分解为一系列易于处理的帧级子问题。然后,利用遗传算法计算目标Q值替代目标神经网络输出以调整强化学习进化方向,有效避免了算法陷入局部最优,从而实现动态车辆网络中的高效卸载和资源优化。通过综合仿真验证了所提计算卸载架构和算法的可行性和优越性。Abstract:
Objective In response to the rapid growth of mobile users and the limited distribution of ground infrastructure, this research addresses the challenges faced by vehicular networks. It emphasizes the need for efficient computation offloading and resource optimization, highlighting the role of Unmanned Aerial Vehicles (UAVs), RoadSide Units (RSUs), and Base Stations (BSs) in enhancing overall system performance. Methods This paper presents an innovative research methodology that proposes an energy harvesting-assisted air-ground cooperative computation offloading architecture. This architecture integrates UAVs, RSUs, and BSs to effectively manage the dynamic task queues generated by vehicles. By incorporating Energy Harvesting (EH) technology, UAVs can capture and convert ambient renewable energy, ensuring a continuous power supply and stable computing capabilities. To address the challenges associated with time-varying channel conditions and high mobility of nodes, a Mixed Integer Programming (MIP) problem is formulated. An iterative process is used to adjust offloading decisions and computing resource allocations at low cost, aiming to optimize overall system performance. The approach is outlined as follows: Firstly, an innovative framework for energy harvesting-assisted air-ground cooperative computation offloading is introduced. This framework enables the collaborative management of dynamic task queues generated by vehicles through the integration of UAVs, RSUs, and BSs. The inclusion of EH technology ensures that UAVs maintain a continuous power supply and stable computing capabilities, addressing limitations due to finite energy resources. Secondly, to address system complexities—such as time-varying channel conditions, high node mobility, and dynamic task arrivals—a MIP problem is formulated. The objective is to optimize system performance by determining effective joint offloading decisions and resource allocation strategies, minimizing global service delays while meeting various dynamic and long-term energy constraints. Thirdly, an Improved Actor-Critic Algorithm (IACA), based on reinforcement learning principles, is introduced to solve the formulated MIP problem. This algorithm utilizes Lyapunov optimization to decompose the problem into frame-level deterministic optimizations, thereby enhancing its manageability. Additionally, a genetic algorithm is employed to compute target Q-values, which guides the reinforcement learning process and enhances both solution efficiency and global optimality. The IACA algorithm is implemented to iteratively refine offloading decisions and resource allocations, striving for optimized system performance. Through the integration of these research methodologies, this paper makes significant contributions to the field of air-ground cooperative computation offloading by providing a novel framework and algorithm designed to address the challenges posed by limited energy resources, fluctuating channel conditions, and high node mobility. Results and Discussions The effectiveness and efficiency of the proposed framework and algorithm are evaluated through extensive simulations. The results illustrate the capability of the proposed approach to achieve dynamic and efficient offloading and resource optimization within vehicular networks.The performance of the IACA algorithm is illustrated, emphasizing its efficient convergence. Over the course of 4,000 training episodes, the agent continuously interacted with the environment, refining its decision-making strategy and updating network parameters. As shown, the loss function values for both the Actor and Critic networks progressively decreased, indicating improvements in their ability to model the real-world environment. Meanwhile, a rising trend in reward values is observed as training episodes increase, ultimately stabilizing, which signifies that the agent has discovered a more effective decision-making strategy. The average system delay and energy consumption relative to time slots are presented. As the number of slots increases, the average delay decreases for all algorithms except for RA, which remains the highest due to random offloading. RLA2C demonstrates superior performance over RLASD due to its advantage function. IACA, trained repeatedly in dynamic environments, achieves an average service delay that closely approximates CPLEX’s optimal performance. Additionally, it significantly reduces average energy consumption by minimizing Lyapunov drift and penalties, outperforming both RA and RLASD. The impact of task input data size on system performance is examined. As the data size increases from 750 kbit to 1,000 kbit, both average delay and energy consumption rise. The IACA algorithm, with its effective interaction with the environment and enhanced genetic algorithm, consistently produces near-optimal solutions, demonstrating strong performance in both energy efficiency and delay management. In contrast, the performance gap between RLASD and RLA2C widens compared to CPLEX due to unstable training environments for larger tasks. RA leads to significant fluctuations in average delay and energy consumption. The effect of the Lyapunov parameter V on average delay and energy consumption at T=200 is illustrated. With V, performance can be finely tuned; as V increases, average delay decreases while energy consumption rises, eventually stabilizing. The IACA algorithm, with its enhanced Q-values, effectively optimizes both delay and energy. Furthermore, the impact of UAV energy thresholds and counts on average system delay is demonstrated. IACA avoids local optima and adapts effectively to thresholds, outperforming RLA2C, RLASD, and RA. An increase in the number of UAVs initially reduces delay; however, an excess can lead to increased delay due to limited computing power. Conclusions The proposed EH-assisted collaborative air-ground computing offloading framework and IACA algorithm significantly improve the performance of vehicular networks by optimizing offloading decisions and resource allocations. Simulation results validate the effectiveness of the proposed methodology in reducing average delay, enhancing energy efficiency, and increasing system throughput. Future research could focus on integrating more advanced energy harvesting technologies and further refining the proposed algorithm to better address the complexities associated with large-scale vehicular networks. (While specific figures or tables are not referenced in this summary due to format constraints, the simulations conducted within the paper provide comprehensive quantitative results to support the findings discussed.) -
1 基于改进Actor-Critic强化学习算法的计算卸载策略
输入:系统状态 $ {\boldsymbol{S}}_{t} $,参数 $ V $,奖励折扣因子 $ \gamma $,Actor 网络结构,Critic 网络结构 输出:卸载决策$ {\hat{\boldsymbol{\alpha }}}^{t} $,每个时间帧对应的最优计算频率分配$ {\hat{\boldsymbol{f}}}^{t} $ (1) 初始化经验池, 网络模型参数以及系统环境参数; (2) for episode $ \leftarrow \mathrm{1,2},\cdots $ do (3) 获取当前环境系统初始状态 $ {\boldsymbol{S}}_{0} $ (4) Actor 生成一个0~1的松驰动作 $ {\hat{\alpha }}_{u,s}^{t},{\hat{f}}_{u}^{t} $; (5) 将$ {\hat{\alpha }}_{u,s}^{t} $和$ {\hat{f}}_{u}^{t} $量化为二进制动作$ {\hat{\boldsymbol{\alpha }}}^{t} $和满足约束条件的计算频率$ {\hat{\boldsymbol{f}}}^{t} $,得到动作$ {\boldsymbol{A}}_{t} $; (6) 基于动作 $ {\boldsymbol{A}}_{t} $ 得到下一个的状态 $ {\boldsymbol{S}}_{t+1} $ 和当前奖励 $ {R}_{t} $; (7) 改进遗传算法生成卸载决策$ {\bar{\alpha }}_{u,s}^{t}, $和奖励 $ {\bar{R}}_{t} $; (8) if $ {\bar{R}}_{t} > {R}_{t} $ then (9) $ {\boldsymbol{A}}_{t}=\left\{{\bar{\alpha }}_{u,s}^{t},{f}_{u}^{t}\right\} $ (10) $ {R}_{t}={\stackrel{-}{R}}_{t} $ (11) 将 $ \left\{{\boldsymbol{S}}_{t},{\boldsymbol{A}}_{t},{R}_{t},{\boldsymbol{S}}_{t+1}\right\} $ 存储至缓冲池中; (12) for Agent do (13) 从经验池中随机采样批量数据 $ \left\{{\boldsymbol{S}}_{t},{\boldsymbol{A}}_{t},{R}_{t},{\boldsymbol{S}}_{t+1}\right\} $; (14) 通过 $ {\lambda }_{t}={R}_{t}+\gamma Q\left({\boldsymbol{S}}_{t+1},{\boldsymbol{A}}_{t+1}:{\omega }^{{{'}}}\right) $ 计算 TD 目标值; (15) 计算损失值 $ \mathrm{L}\mathrm{o}\mathrm{s}\mathrm{s}\left(\omega \right)=\dfrac{1}{2}{\left[Q\left({\boldsymbol{S}}_{t},{\boldsymbol{A}}_{t}:\omega \right)-{\lambda }_{t}\right]}^{2} $,更新 Critic 网络; (16) 计算损失值 $ \mathrm{L}\mathrm{o}\mathrm{s}\mathrm{s}\left(\theta \right)={{\text{∇}} }_{\mathrm{\theta }}\mathrm{l}\mathrm{n}{\pi }_{\theta }\left({\boldsymbol{S}}_{t},{\boldsymbol{A}}_{t}\right)Q\left({\boldsymbol{S}}_{t},{\boldsymbol{A}}_{t}:\omega \right) $ ,采用策略梯度更新 Actor 网络; (17) for $ t=\mathrm{1,2},\cdots ,T $ do (18) 获取时隙t 的环境状态; (19) 利用训练好的 Actor-Critic 模型,得到时隙t的最优卸载决策$ {\hat{\boldsymbol{\alpha }}}^{t} $和计算频率$ {\hat{\boldsymbol{f}}}^{t} $; 表 1 实验参数表
参数 值 参数 值 UAV计算能效系数 $ {\kappa }_{u} $ 10–28 UAV飞行速度 $ {v}_{u}^{t} $ 25 m/s 可用带宽$ {B}_{u,v} $ 3 MHz 可用带宽 $ {B}_{u,r} $ 1 MHz 可用带宽 $ {B}_{u,0} $ 2.5 MHz 奖励折扣因子 $ \gamma $ 0.95 模型训练优化器 AdamOptimizer 批处理数量 512 Actor 学习率 0.001 Critic 学习率 0.002 天线增益$ {A}_{d} $ 3 载波频率$ {F}_{u,r} $ 915 MHz 路径损耗$ {g}_{0} $ –40 dB 参考距离 $ {d}_{0} $ 1 m -
[1] ZHANG Haibo, LIU Xiangyu, XU Yongjun, et al. Partial offloading and resource allocation for MEC-assisted vehicular networks[J]. IEEE Transactions on Vehicular Technology, 2024, 73(1): 1276–1288. doi: 10.1109/TVT.2023.3306939. [2] LIU Qian, LIANG Hairong, LUO Rui, et al. Energy-efficiency computation offloading strategy in UAV aided V2X network with integrated sensing and communication[J]. IEEE Open Journal of the Communications Society, 2022, 3: 1337–1346. doi: 10.1109/OJCOMS.2022.3195703. [3] YU Zhe, GONG Yanmin, GONG Shimin, et al. Joint task offloading and resource allocation in UAV-enabled mobile edge computing[J]. IEEE Internet of Things Journal, 2020, 7(4): 3147–3159. doi: 10.1109/JIOT.2020.2965898. [4] HU Jinna, CHEN Chen, CAI Lin, et al. UAV-assisted vehicular edge computing for the 6G internet of vehicles: Architecture, intelligence, and challenges[J]. IEEE Communications Standards Magazine, 2021, 5(2): 12–18. doi: 10.1109/MCOMSTD.001.2000017. [5] WANG Junhua, WANG Ling, ZHU Kun, et al. Lyapunov-based joint flight trajectory and computation offloading optimization for UAV-assisted vehicular networks[J]. IEEE Internet of Things Journal, 2024, 11(2): 22243–22256. doi: 10.1109/JIOT.2024.3382242. [6] HE Yixin, ZHAI Daosen, ZHANG Ruonan, et al. A mobile edge computing framework for task offloading and resource allocation in UAV-assisted VANETs[C]. IEEE Conference on Computer Communications Workshop, Vancouver, Canada, 2021: 1–6. doi: 10.1109/INFOCOMWKSHPS51825.2021.9484643. [7] DUAN Xuting, ZHOU Yukang, TIAN Daxin, et al. Weighted energy-efficiency maximization for a UAV-assisted multiplatoon mobile-edge computing system[J]. IEEE Internet of Things Journal, 2022, 9(19): 18208–18220. doi: 10.1109/JIOT.2022.3155608. [8] ZHAO Nan, YE Zhiyang, PEI Yiyang, et al. Multi-agent deep reinforcement learning for task offloading in UAV-assisted mobile edge computing[J]. IEEE Transactions on Wireless Communications, 2022, 21(9): 6949–6960. doi: 10.1109/TWC.2022.3153316. [9] ZHANG Liang, JABBARI B, and ANSARI N. Deep reinforcement learning driven UAV-assisted edge computing[J]. IEEE Internet of Things Journal, 2022, 9(24): 25449–25459. doi: 10.1109/JIOT.2022.3196842. [10] YAN Ming, XIONG Rui, WANG Yan, et al. Edge computing task offloading optimization for a UAV-assisted internet of vehicles via deep reinforcement learning[J]. IEEE Transactions on Vehicular Technology, 2024, 73(4): 5647–5658. doi: 10.1109/TVT.2023.3331363. [11] WU Zhiwei, YANG Zilin, YANG Chao, et al. Joint deployment and trajectory optimization in UAV-assisted vehicular edge computing networks[J]. Journal of Communications and Networks, 2022, 24(1): 47–58. doi: 10.23919/JCN.2021.000026. [12] YANG Chao, LIU Baichuan, LI Haoyu, et al. Learning based channel allocation and task offloading in temporary UAV-assisted vehicular edge computing networks[J]. IEEE Transactions on Vehicular Technology, 2022, 71(9): 9884–9895. doi: 10.1109/TVT.2022.3177664. [13] PENG Haixia and SHEN Xuemin. Multi-agent reinforcement learning based resource management in MEC- and UAV-assisted vehicular networks[J]. IEEE Journal on Selected Areas in Communications, 2021, 39(1): 131–141. doi: 10.1109/JSAC.2020.3036962. [14] LIU Yinan, YANG Chao, CHEN Xin, et al. Joint hybrid caching and replacement scheme for UAV-assisted vehicular edge computing networks[J]. IEEE Transactions on Intelligent Vehicles, 2024, 9(1): 866–878. doi: 10.1109/TIV.2023.3323217. [15] YAN Junjie, ZHAO Xiaohui, and LI Zan. Deep-reinforcement-learning-based computation offloading in UAV-assisted vehicular edge computing networks[J]. IEEE Internet of Things Journal, 2024, 11(11): 19882–19897. doi: 10.1109/JIOT.2024.3370553. [16] ZHANG Wenqian, LÜ Zilong, GE Mengxia, et al. UAV-assisted vehicular edge computing system: Min-max fair offloading and position optimization[J]. IEEE Transactions on Consumer Electronics, 2024. doi: 10.1109/TCE.2024.3426513. [17] HAN Zihao, ZHOU Ting, XU Tianheng, et al. Joint user association and deployment optimization for delay-minimized UAV-aided MEC networks[J]. IEEE Wireless Communications Letters, 2023, 12(10): 1791–1795. doi: 10.1109/LWC.2023.3294749. [18] FIROZJAE H M, MOGHADDAM J Z, and ARDEBILIPOUR M. A joint trajectory and energy harvesting method for an UAV enabled disaster response network[C]. 13th International Conference on Information and Knowledge Technology, Karaj, Iran, 2022: 1–5. doi: 10.1109/IKT57960.2022.10039000. [19] ZHANG Ning, LIU Juan, XIE Lingfu, et al. A deep reinforcement learning approach to energy-harvesting UAV-aided data collection[C]. 2020 International Conference on Wireless Communications and Signal Processing, Nanjing, China, 2020: 93–98. doi: 10.1109/WCSP49889.2020.9299806. [20] YANG Zheyuan, BI Suzhi, and ZHANG Y J A. Dynamic offloading and trajectory control for UAV-enabled mobile edge computing system with energy harvesting devices[J]. IEEE Transactions on Wireless Communications, 2022, 21(12): 10515–10528. doi: 10.1109/TWC.2022.3184953. [21] CHANG Zheng, LIU Liqing, GUO Xijuan, et al. Dynamic resource allocation and computation offloading for IoT fog computing system[J]. IEEE Transactions on Industrial Informatics, 2021, 17(5): 3348–3357. doi: 10.1109/TII.2020.2978946. [22] DAI Xingxia, XIAO Zhu, JIANG Hongbo, et al. UAV-assisted task offloading in vehicular edge computing networks[J]. IEEE Transactions on Mobile Computing, 2024, 23(4): 2520–2534. doi: 10.1109/TMC.2023.3259394. [23] WANG Feng, XU Jie, WANG Xin, et al. Joint offloading and computing optimization in wireless powered mobile-edge computing systems[J]. IEEE Transactions on Wireless Communications, 2018, 17(3): 1784–1797. doi: 10.1109/TWC.2017.2785305. [24] YUE Yuchen and WANG Junhua. Lyapunov-based dynamic computation offloading optimization in heterogeneous vehicular networks[C]. 2022 IEEE International Symposium on Product Compliance Engineering - Asia, Guangzhou, China, 2022: 1–6. doi: 10.1109/ISPCE-ASIA57917.2022.9971076. [25] TREIBER M and KESTING A. Traffic Flow Dynamics: Data, Models and Simulation[M]. Berlin: Springer, 2013. doi: 10.1007/978-3-642-32460-4. [26] HUANG Liang, BI Suzhi, and ZHANG Y J A. Deep reinforcement learning for online computation offloading in wireless powered mobile-edge computing networks[J]. IEEE Transactions on Mobile Computing, 2020, 19(11): 2581–2593. doi: 10.1109/TMC.2019.2928811. [27] DAGANZO C F. Traffic flow theory[M]. DAGANZO C F. Fundamentals of Transportation and Traffic Operations. Oxford: Pergamon, 1997: 66–160. doi: 10.1108/ 9780585475301-004. [28] SHI Weisen, LI Junling, CHENG Nan, et al. Multi-drone 3-D trajectory planning and scheduling in drone-assisted radio access networks[J]. IEEE Transactions on Vehicular Technology, 2019, 68(8): 8145–8158. doi: 10.1109/TVT.2019.2925629. [29] SUN Yukun and ZHANG Xing. A2C learning for tasks segmentation with cooperative computing in edge computing networks[C]. 2022 IEEE Global Communications Conference, Rio de Janeiro, Brazil, 2022: 2236–2241. doi: 10.1109/GLOBECOM48099.2022.10000948.