Collaborative Air-Ground Computation Offloading and Resource Optimization in Dynamic Vehicular Network Scenarios
-
摘要: 针对移动用户数量迅猛增长和地面基础设施分布稀疏所带来的挑战,该文提出一种能量收集辅助的空地协同计算卸载架构。该架构充分利用无人机(UAVs)的灵活机动性和路侧单元(RSUs)及基站(BS)的强大算力,实现了任务计算的动态实时分发。特别地,无人机通过能量收集来维持其持续运行和稳定的计算性能。考虑到无人机与地面车辆的高动态性、车辆计算任务的随机性,以及信道模型的时变性,提出一个能耗受限的长期优化问题,旨在从全局角度有效降低整个系统的平均时延。为了解决这一复杂的混合整数规划(MIP)问题,提出一种基于改进演员-评论家(Actor-Critic)强化学习算法的计算卸载策略(IACA)。该算法运用李雅普诺夫优化技术,将长期系统时延优化问题分解为一系列易于处理的帧级子问题。然后,利用遗传算法计算目标Q值替代目标神经网络输出以调整强化学习进化方向,有效避免了算法陷入局部最优,从而实现动态车辆网络中的高效卸载和资源优化。通过综合仿真验证了所提计算卸载架构和算法的可行性和优越性。Abstract:
Objective In response to the rapid growth of mobile users and the sparse distribution of ground infrastructure, the research presented in this paper aims to address the challenges faced by vehicular networks. It emphasizes the importance of efficient computation offloading and resource optimization in such networks, highlighting the necessity of leveraging unmanned aerial vehicles (UAVs) and roadside units (RSUs), along with base stations (BSs), to enhance the overall system performance. Methods The research methodology of this paper innovatively proposes an energy harvesting-assisted air-ground cooperative computation offloading architecture, which integrates UAVs, RSUs, and BSs to efficiently handle dynamic task queues generated by vehicles. By incorporating EH technology, UAVs capture and convert ambient renewable energy, ensuring continuous power supply and stable computing power. Addressing the time-varying channel conditions and high mobility of nodes, this study formulates a Mixed Integer Programming (MIP) problem. An iterative process is employed to adjust offloading decisions and computing resource allocations at low cost, aiming to optimize system performance. Technology details are described as follows.Firstly, the paper innovatively proposes an energy harvesting-assisted air-ground cooperative computation offloading framework. This framework integrates UAVs, RSUs, and BSs to collaboratively manage dynamic task queues generated by vehicles. By introducing EH technology, the framework ensures continuous power supply and stable computing capabilities for UAVs, addressing the challenges posed by limited energy resources.Secondly, to address the complexities of the system, including time-varying channel conditions, high node mobility, and dynamic task arrivals, the paper formulates a Mixed Integer Programming (MIP) problem. This problem is aimed at optimizing the system’s performance by finding the best joint offloading decisions and resource allocation strategies. The objective is to minimize global service delay while satisfying various dynamic and long-term energy constraints.Thirdly, to solve the formulated MIP problem, the paper introduces an Improved Actor-Critic Algorithm (IACA) based on reinforcement learning. This algorithm leverages Lyapunov optimization to decompose the problem into frame-level deterministic optimizations, making it more manageable. Additionally, a genetic algorithm is used to compute target Q-values, guiding the reinforcement learning process and improving solution efficiency and global optimality. The IACA algorithm is implemented to iteratively adjust offloading decisions and resource allocations, achieving the desired system performance optimization.By combining these research methods, the paper contributes to the field of air-ground cooperative computation offloading, providing a novel framework and algorithm to address the challenges posed by limited energy resources, time-varying channel conditions, and high node mobility. Results and Discussions The proposed framework and algorithm are evaluated through extensive simulations. The results demonstrate the effectiveness and efficiency of the proposed method in achieving dynamic and efficient offloading and resource optimization in vehicular networks.( Fig.3 ) shows the performance of the IACA algorithm, highlighting its efficient convergence. Through 4,000 training episodes, the agent continuously interacted with the environment, refining its decision-making strategy and updating network parameters. As depicted inFigures 3(a) and3(b) , the loss function values of both the Actor and Critic networks decreased progressively, reflecting improvements in modeling the real-world environment. Meanwhile,Figure 3(c) indicates a rising trend in reward values with increasing training episodes, ultimately stabilizing, which signifies the discovery of a more effective decision-making strategy by the agent. (Fig.4 ) shows system avg. delay and energy consumption vs. time slots. As slots increase, avg. delay decreases for all algorithms except RA (highest due to random offloading). RLA2C outperforms RLASD with its advantage function. IACA, trained repeatedly in dynamic environments, achieves avg. service delay close to CPLEX's optimal. It also significantly reduces avg. energy consumption by minimizing Lyapunov drift plus penalty, outperforming RA and RLASD. (Fig.5 ) shows the impact of task input data size on system performance. As data increases from 750 kbit to 1,000 kbit, avg. delay and energy consumption rise. The IACA algorithm, with effective environment interaction and an improved genetic algorithm, robustly generates near-ideal optimal solutions, excelling in both energy and delay. In contrast, RLASD and RLA2C gap widens from CPLEX due to unstable training environments for large tasks. RA causes significant avg. delay and energy consumption fluctuations. (Fig.6 ) show Lyapunov parameter V's impact on avg. delay and energy at T=200. With V, performance is finely controlled. As V increases, avg. delay drops while energy rises, stabilizing. IACA, with improved Q-values, excels in delay and energy optimization. What’ more,Fig. 7 shows UAV energy thresholds & counts impact avg. system delay. IACA avoids local optima, adapts to thresholds, outperforming RLA2C, RLASD, RA. More UAVs initially reduce delay but excess can increase it due to limited computing power.Conclusions The proposed EH-assisted collaborative air-ground computing offloading framework and IACA algorithm significantly enhance the performance of vehicular networks by optimizing offloading decisions and resource allocations. The simulation results demonstrate the effectiveness of the proposed method in reducing average delay, improving energy efficiency, and increasing system throughput. Future work could explore the integration of more advanced EH technologies and further refine the proposed algorithm to address the complexities of large-scale vehicular networks. (No specific figures or tables are directly referenced in this summary due to format constraints, but the simulations conducted in the paper provide detailed quantitative results to support the discussed findings.) -
1 基于改进Actor-Critic强化学习算法的计算卸载策略
输入:系统状态 $ {\boldsymbol{S}}_{t} $,参数 $ V $,奖励折扣因子 $ \gamma $,Actor 网络结构,Critic 网络结构 输出:卸载决策$ {\hat{\boldsymbol{\alpha }}}^{t} $,每个时间帧对应的最优计算频率分配$ {\widehat{\boldsymbol{f}}}^{t} $ (1) 初始化经验池, 网络模型参数以及系统环境参数; (2) for episode $ \leftarrow \mathrm{1,2},\cdots $ do (3) 获取当前环境系统初始状态 $ {\boldsymbol{S}}_{0} $ (4) Actor 生成一个0~1的松驰动作 $ {\hat{\alpha }}_{u,s}^{t},{\hat{f}}_{u}^{t} $; (5) 将$ {\hat{\alpha }}_{u,s}^{t} $和$ {\hat{f}}_{u}^{t} $量化为二进制动作$ {\hat{\boldsymbol{\alpha }}}^{t} $和满足约束条件的计算频率$ {\widehat{\boldsymbol{f}}}^{t} $,得到动作$ {\boldsymbol{A}}_{t} $; (6) 基于动作 $ {\boldsymbol{A}}_{t} $ 得到下一个的状态 $ {\boldsymbol{S}}_{t+1} $ 和当前奖励 $ {R}_{t} $; (7) 改进遗传算法生成卸载决策$ {\stackrel{-}{\alpha }}_{u,s}^{t}, $和奖励 $ {\stackrel{-}{R}}_{t} $; (8) if $ {\bar{R}}_{t} > {R}_{t} $ then (9) $ {\boldsymbol{A}}_{t}=\left\{{\bar{\alpha }}_{u,s}^{t},{f}_{u}^{t}\right\} $ (10) $ {R}_{t}={\stackrel{-}{R}}_{t} $ (11) 将 $ \left\{{\boldsymbol{S}}_{t},{\boldsymbol{A}}_{t},{R}_{t},{\boldsymbol{S}}_{t+1}\right\} $ 存储至缓冲池中; (12) for Agent do (13) 从经验池中随机采样批量数据 $ \left\{{\boldsymbol{S}}_{t},{\boldsymbol{A}}_{t},{R}_{t},{\boldsymbol{S}}_{t+1}\right\} $; (14) 通过 $ {\lambda }_{t}={R}_{t}+\gamma Q\left({\boldsymbol{S}}_{t+1},{\boldsymbol{A}}_{t+1}:{\omega }^{\mathrm{\text{'}}}\right) $ 计算 TD 目标值; (15) 计算损失值 $ \mathrm{L}\mathrm{o}\mathrm{s}\mathrm{s}\left(\omega \right)=\dfrac{1}{2}{\left[Q\left({\boldsymbol{S}}_{t},{\boldsymbol{A}}_{t}:\omega \right)-{\lambda }_{t}\right]}^{2} $,更新 Critic 网络; (16) 计算损失值 $ \mathrm{L}\mathrm{o}\mathrm{s}\mathrm{s}\left(\theta \right)={\nabla }_{\mathrm{\theta }}\mathrm{l}\mathrm{n}{\pi }_{\theta }\left({\boldsymbol{S}}_{t},{\boldsymbol{A}}_{t}\right)Q\left({\boldsymbol{S}}_{t},{\boldsymbol{A}}_{t}:\omega \right) $ ,采用策略梯度更新 Actor 网络; (17) for $ t=\mathrm{1,2},\cdots ,T $ do (18) 获取时隙t 的环境状态; (19) 利用训练好的 Actor-Critic 模型,得到时隙t的最优卸载决策$ {\hat{\boldsymbol{\alpha }}}^{t} $和计算频率$ {\hat{\boldsymbol{f}}}^{t} $; 表 1 实验参数表
参数 值 参数 值 UAV计算能效系数 $ {\kappa }_{u} $ 10–28 UAV飞行速度 $ {v}_{u}^{t} $ 25 m/s 可用带宽$ {B}_{u,v} $ 3 MHz 可用带宽 $ {B}_{u,r} $ 1 MHz 可用带宽 $ {B}_{u,0} $ 2.5 MHz 奖励折扣因子 $ \gamma $ 0.95 模型训练优化器 AdamOptimizer 批处理数量 512 Actor 学习率 0.001 Critic 学习率 0.002 天线增益$ {A}_{d} $ 3 载波频率$ {F}_{u,r} $ 915 MHz 路径损耗$ {g}_{0} $ –40 dB 参考距离 $ {d}_{0} $ 1 m -
[1] ZHANG Haibo, LIU Xiangyu, XU Yongjun, et al. Partial offloading and resource allocation for MEC-assisted vehicular networks[J]. IEEE Transactions on Vehicular Technology, 2024, 73(1): 1276–1288. doi: 10.1109/TVT.2023.3306939. [2] LIU Qian, LIANG Hairong, LUO Rui, et al. Energy-efficiency computation offloading strategy in UAV aided V2X network with integrated sensing and communication[J]. IEEE Open Journal of the Communications Society, 2022, 3: 1337–1346. doi: 10.1109/OJCOMS.2022.3195703. [3] YU Zhe, GONG Yanmin, GONG Shimin, et al. Joint task offloading and resource allocation in UAV-enabled mobile edge computing[J]. IEEE Internet of Things Journal, 2020, 7(4): 3147–3159. doi: 10.1109/JIOT.2020.2965898. [4] HU Jinna, CHEN Chen, CAI Lin, et al. UAV-assisted vehicular edge computing for the 6G internet of vehicles: Architecture, intelligence, and challenges[J]. IEEE Communications Standards Magazine, 2021, 5(2): 12–18. doi: 10.1109/MCOMSTD.001.2000017. [5] WANG Junhua, WANG Ling, ZHU Kun, et al. Lyapunov-based joint flight trajectory and computation offloading optimization for UAV-assisted vehicular networks[J]. IEEE Internet of Things Journal, 2024, 11(2): 22243–22256. doi: 10.1109/JIOT.2024.3382242. [6] HE Yixin, ZHAI Daosen, ZHANG Ruonan, et al. A mobile edge computing framework for task offloading and resource allocation in UAV-assisted VANETs[C]. IEEE Conference on Computer Communications Workshop, Vancouver, Canada, 2021: 1–6. doi: 10.1109/INFOCOMWKSHPS51825.2021.9484643. [7] DUAN Xuting, ZHOU Yukang, TIAN Daxin, et al. Weighted energy-efficiency maximization for a UAV-assisted multiplatoon mobile-edge computing system[J]. IEEE Internet of Things Journal, 2022, 9(19): 18208–18220. doi: 10.1109/JIOT.2022.3155608. [8] ZHAO Nan, YE Zhiyang, PEI Yiyang, et al. Multi-agent deep reinforcement learning for task offloading in UAV-assisted mobile edge computing[J]. IEEE Transactions on Wireless Communications, 2022, 21(9): 6949–6960. doi: 10.1109/TWC.2022.3153316. [9] ZHANG Liang, JABBARI B, and ANSARI N. Deep reinforcement learning driven UAV-assisted edge computing[J]. IEEE Internet of Things Journal, 2022, 9(24): 25449–25459. doi: 10.1109/JIOT.2022.3196842. [10] YAN Ming, XIONG Rui, WANG Yan, et al. Edge computing task offloading optimization for a UAV-assisted internet of vehicles via deep reinforcement learning[J]. IEEE Transactions on Vehicular Technology, 2024, 73(4): 5647–5658. doi: 10.1109/TVT.2023.3331363. [11] WU Zhiwei, YANG Zilin, YANG Chao, et al. Joint deployment and trajectory optimization in UAV-assisted vehicular edge computing networks[J]. Journal of Communications and Networks, 2022, 24(1): 47–58. doi: 10.23919/JCN.2021.000026. [12] YANG Chao, LIU Baichuan, LI Haoyu, et al. Learning based channel allocation and task offloading in temporary UAV-assisted vehicular edge computing networks[J]. IEEE Transactions on Vehicular Technology, 2022, 71(9): 9884–9895. doi: 10.1109/TVT.2022.3177664. [13] PENG Haixia and SHEN Xuemin. Multi-agent reinforcement learning based resource management in MEC- and UAV-assisted vehicular networks[J]. IEEE Journal on Selected Areas in Communications, 2021, 39(1): 131–141. doi: 10.1109/JSAC.2020.3036962. [14] LIU Yinan, YANG Chao, CHEN Xin, et al. Joint hybrid caching and replacement scheme for UAV-assisted vehicular edge computing networks[J]. IEEE Transactions on Intelligent Vehicles, 2024, 9(1): 866–878. doi: 10.1109/TIV.2023.3323217. [15] YAN Junjie, ZHAO Xiaohui, and LI Zan. Deep-reinforcement-learning-based computation offloading in UAV-assisted vehicular edge computing networks[J]. IEEE Internet of Things Journal, 2024, 11(11): 19882–19897. doi: 10.1109/JIOT.2024.3370553. [16] ZHANG Wenqian, LÜ Zilong, GE Mengxia, et al. UAV-assisted vehicular edge computing system: Min-max fair offloading and position optimization[J]. IEEE Transactions on Consumer Electronics, 2024. doi: 10.1109/TCE.2024.3426513. (查阅网上资料,未找到本条文献卷期、页码信息,请确认) . [17] HAN Zihao, ZHOU Ting, XU Tianheng, et al. Joint user association and deployment optimization for delay-minimized UAV-aided MEC networks[J]. IEEE Wireless Communications Letters, 2023, 12(10): 1791–1795. doi: 10.1109/LWC.2023.3294749. [18] FIROZJAE H M, MOGHADDAM J Z, and ARDEBILIPOUR M. A joint trajectory and energy harvesting method for an UAV enabled disaster response network[C]. 13th International Conference on Information and Knowledge Technology, Karaj, Iran, 2022: 1–5. doi: 10.1109/IKT57960.2022.10039000. [19] ZHANG Ning, LIU Juan, XIE Lingfu, et al. A deep reinforcement learning approach to energy-harvesting UAV-aided data collection[C]. 2020 International Conference on Wireless Communications and Signal Processing, Nanjing, China, 2020: 93–98. doi: 10.1109/WCSP49889.2020.9299806. [20] YANG Zheyuan, BI Suzhi, and ZHANG Y J A. Dynamic offloading and trajectory control for UAV-enabled mobile edge computing system with energy harvesting devices[J]. IEEE Transactions on Wireless Communications, 2022, 21(12): 10515–10528. doi: 10.1109/TWC.2022.3184953. [21] CHANG Zheng, LIU Liqing, GUO Xijuan, et al. Dynamic resource allocation and computation offloading for IoT fog computing system[J]. IEEE Transactions on Industrial Informatics, 2021, 17(5): 3348–3357. doi: 10.1109/TII.2020.2978946. [22] DAI Xingxia, XIAO Zhu, JIANG Hongbo, et al. UAV-assisted task offloading in vehicular edge computing networks[J]. IEEE Transactions on Mobile Computing, 2024, 23(4): 2520–2534. doi: 10.1109/TMC.2023.3259394. [23] WANG Feng, XU Jie, WANG Xin, et al. Joint offloading and computing optimization in wireless powered mobile-edge computing systems[J]. IEEE Transactions on Wireless Communications, 2018, 17(3): 1784–1797. doi: 10.1109/TWC.2017.2785305. [24] YUE Yuchen and WANG Junhua. Lyapunov-based dynamic computation offloading optimization in heterogeneous vehicular networks[C]. 2022 IEEE International Symposium on Product Compliance Engineering - Asia, Guangzhou, China, 2022: 1–6. doi: 10.1109/ISPCE-ASIA57917.2022.9971076. [25] TREIBER M and KESTING A. Traffic Flow Dynamics: Data, Models and Simulation[M]. Berlin: Springer, 2013. doi: 10.1007/978-3-642-32460-4. (查阅网上资料,未找到本条文献页码信息,请补充) . [26] HUANG Liang, BI Suzhi, and ZHANG Y J A. Deep reinforcement learning for online computation offloading in wireless powered mobile-edge computing networks[J]. IEEE Transactions on Mobile Computing, 2020, 19(11): 2581–2593. doi: 10.1109/TMC.2019.2928811. [27] DAGANZO C F. Traffic flow theory[M]. DAGANZO C F. Fundamentals of Transportation and Traffic Operations. Oxford: Pergamon, 1997: 66–160. doi: 10.1108/ 9780585475301-004. [28] SHI Weisen, LI Junling, CHENG Nan, et al. Multi-drone 3-D trajectory planning and scheduling in drone-assisted radio access networks[J]. IEEE Transactions on Vehicular Technology, 2019, 68(8): 8145–8158. doi: 10.1109/TVT.2019.2925629. [29] SUN Yukun and ZHANG Xing. A2C learning for tasks segmentation with cooperative computing in edge computing networks[C]. 2022 IEEE Global Communications Conference, Rio de Janeiro, Brazil, 2022: 2236–2241. doi: 10.1109/GLOBECOM48099.2022.10000948.