UAV-assisted Mobile Edge Computing based on Hybrid Hierarchical DRL in the Internet of Vehicular
-
摘要: 针对车联网中无人机辅助移动边缘计算场景下,基于深度强化学习的时延优化方法因车辆规模增大导致动作空间维度爆炸、训练效率低的问题,该文提出一种无人机辅助移动边缘计算的双层混合优化方案。首先,通过联合优化任务卸载、计算与通信资源分配以及无人机飞行控制,构建满足飞行与能耗约束条件下最小化系统计算任务总时延的模型。其次,该文通过双层算法结构将深度强化学习与贪婪算法相结合,提出混合分层深度强化学习(Hybrid Hierarchical Deep Reinforcement Learning, HHDRL)算法对问题求解,以降低训练复杂度并加快收敛。仿真结果表明,该算法在保持时延性能接近传统深度强化学习算法的同时,提升了收敛速度。Abstract:
Objective In the Internet of Vehicles (IoV), the use of Unmanned Aerial Vehicles (UAVs) to address increasing edge computing demand has become a key direction in 6G research. However, when Deep Reinforcement Learning (DRL) is applied to optimize system latency, the action space grows exponentially with the number of vehicles and causes training difficulty and slow convergence. This study proposes a two-layer hybrid solution for UAV-assisted Mobile Edge Computing (MEC) based on DRL, termed Hybrid Hierarchical Deep Reinforcement Learning (HHDRL). Methods The HHDRL algorithm adopts a two-layer architecture to decompose complex optimization tasks. The upper layer uses an agent based on Proximal Policy Optimization (PPO) and a multi-head actor network to manage user offloading and UAV control policies. The N heads determine offloading decisions for N users, including local processing or offloading to associated CAPs or the UAV. A separate UAV flight-control head selects discrete acceleration actions to satisfy practical control constraints. The lower layer applies a computationally efficient greedy algorithm to prioritize resources based on task characteristics. This hybrid hierarchical design reduces the computational cost associated with DRL-only resource allocation. Results and Discussions The performance of the HHDRL scheme was evaluated through numerical simulations using a Rician fading channel model, a UAV flight energy consumption model, and system parameters such as mission data sizes of 9~18 Mbits and mission complexities of 2 000~3 000 cycles/bit. Figure 3 shows that HHDRL converges faster than standard DRL, although the final reward is slightly lower. Figure 4 indicates that HHDRL maintains the user delay fairness of DRL. The evaluation in Figure 5 shows that the proposed method reduces system latency by approximately 71~91% compared with a random baseline and by 1~12% compared with the original DRL algorithm. Figure 6 shows training time results for different numbers of users; HHDRL consistently achieves shorter training times, and its training time grows more slowly as the number of users increases. This results from the reduced DRL output action space. When the PPO-based upper layer is replaced with other DRL algorithms, the scheme still outperforms the random baseline and achieves performance comparable to non-hierarchical DRL, demonstrating the generality of the architecture. Figure 8 shows that computational resources have the strongest effect on latency because computation typically dominates total task processing time.Figure 9 presents UAV trajectory optimization.Figure 9(a) shows realistic velocity changes under discrete acceleration control.Figure 9(b) shows that the UAV adjusts its position to track dynamic user distribution while maintaining stable flight.Conclusions This study presents an HHDRL algorithm that integrates DRL with a greedy strategy in a hierarchical framework to address the training challenges of UAV-assisted MEC in IoV scenarios. The simulations show that (1) the proposed method accelerates convergence and reduces training time compared with standard DRL; (2) its latency performance is comparable to DRL and significantly better than heuristic and random baselines; and (3) the framework effectively manages task offloading, resource allocation, and UAV trajectory optimization under practical constraints. Future work will extend the framework to multi-UAV collaboration and more complex environments. -
1 计算任务卸载和UAV控制决策
(1) 输入:当前时隙系统状态$ S\left(m\right) $,车辆数量$ N $,CAP数量 $ K $,多头Actor网络$ {\text{π} }_{\theta } $,Critic网络$ {V}_{\phi } $ (2) 输出:计算任务卸载策略$ OP(m) $和UAV控制策略 $ F(m) $ (3) 状态获取:从环境获取当前时隙状态$ S\left(m\right) $:$ N $个用户状态、$ C $个CAP状态、1个UAV状态 (4) 状态输入:状态$ S\left(m\right) $同时输入到Actor网络$ {\text{π} }_{\theta } $和Critic网络$ {V}_{\phi } $ (5) 多头Actor网络推理: (6) 任务卸载策略头($ N $个):$ O{P}_{i}(m)=\text{π} _{\theta }^{{\mathrm{task}}_{i}}(S(m))\in \{0,1,2\},i=1,2,\cdots ,N $ (7) UAV控制头(1个):$ d(m)=\text{π} _{\theta }^{\mathrm{UAV}}(S(m))\in \{0,1,2,\cdots ,10\} $(加速度档位选择) (8) Critic网络推理:状态价值评估$ {V}_{\phi }(S(m)) $ (9) 策略输出: (10) 卸载策略:$ \text{OP}(m)=[\text{O{P}}_{1}(m),\text{OP}_{2}(m),\cdots ,\text{OP}_{N}(m)] $ (11) UAV控制策略:$ F(m)=d(m) $ (12) Return $ \text{OP}(m) $,$ F(m) $ 2 计算任务的计算资源和通信资源优先级计算
(1) 输入:当前时隙系统状态$ S\left(m\right) $,计算任务卸载策略
$ \text{OP}(m) $,计算任务数据量$ \{{D}_{i}(m)\}_{i=1}^{N} $,计算任务复杂度
$ \{{\kappa }_{i}(m)\}_{i=1}^{N} $(2) 输出:通信资源优先级$ {P}_{\mathrm{comm}}(m) $,计算资源优先级$ {P}_{\mathrm{comp}}(m) $ (3) 初始化:本地处理任务列表$ {L}_{\mathrm{local}}(m)=\varnothing $,卸载任务列表
$ {L}_{\mathrm{offload}}(m)=\varnothing $(4) 任务分类: (5) for 车辆$ i=1 $ to$ N $ do (6) if $ \text{O{P}}_{i}(m)=0 $then (7) $ {L}_{\mathrm{local}}(m)={L}_{\mathrm{local}}(m)\cup \{i\} $//本地计算 (8) else (9) $ {L}_{\mathrm{offload}}(m)={L}_{\mathrm{offload}}(m)\cup \{i\} $//卸载处理 (10) endif (11) endfor (12) 通信资源优先级计算: (13) for 任务$ i\in {L}_{\mathrm{offload}}(m) $do (14) 通信资源优先级权重:$ W_{i}^{\mathrm{comm}}(m)={D}_{i}(m) $ (15) endfor (16) $ {P}_{\mathrm{comm}}(m) = \mathrm{sort}({L}_{\mathrm{offload}}(m),W_{i}^{\mathrm{comm}}(m)) $//按权重降序排列 (17) 计算资源优先级计算: (18) for 任务$ i\in {L}_{\mathrm{offload}}(m) $do (19) 计算资源优先级权重:$ W_{i}^{\mathrm{comp}}(m) = {D}_{i}(m) \times {\kappa }_{i}(m) $ (20) endfor (21) $ {P}_{\mathrm{comp}}(m)=\mathrm{sort}({L}_{\mathrm{offload}}(m),W_{i}^{\mathrm{comp}}(m)) $//按权重降序排列 (22) Return $ {P}_{\mathrm{comm}}(m) $,$ {P}_{\mathrm{comp}}(m) $ 3 基于HHDRL的联合优化方案
(1) 输入:系统时隙$ m $,车辆数量$ N $,CAP数量$ K $,最大时隙
$ {T}_{\max } $(2) 输出:最小化累计系统时延的联合优化结果 (3) 初始化:BSC基站控制器,系统参数,网络参数,经验回放
缓冲区$ B $(4) while $ m\leq {T}_{\max } $ do (5) 高层决策:执行算法1 (6) 输入当前系统状态$ S\left(m\right) $,获取卸载策略$ \text{OP}\left(m\right) $和
UAV控制策略$ F\left(m\right) $(7) 低层决策:执行算法2 (8) 输入$ \text{OP}\left(m\right) $,$ F\left(m\right) $获取资源优先级$ {P}_{\mathrm{comm}}(m) $,
$ {P}_{\mathrm{comp}}(m) $(9) 形成完整动作:组合高层策略$ \text{OP}\left(m\right) $,$ F\left(m\right) $和低层优先
级$ {P}_{\mathrm{comm}}(m) $,$ {P}_{\mathrm{comp}}(m) $(10) 环境交互:执行完整动作,获取环境反馈$ S\left(m+1\right) $,
$ R\left(m\right) $(11) 高层策略优化(PPO网络参数$ \theta $和$ \phi $更新): (12) 存储经验元组到缓冲区$ B $ (13) if 缓冲区满 then 执行PPO网络参数更新 (14) $ m\leftarrow m+1 $ (15) endwhile (16) return 累计系统时延最小化的联合优化结果 表 1 仿真参数说明
参数 数值 描述 参数 数值 描述 l 2 路径损耗指数 $ R_{\text{CAP}}^{\text{comm}} $ 277 每个CAP最大通信资源块数 $ {P}_{0} $ –50 dB 参考距离信道功率增益 $ R_{\text{UAV}}^\text{comm} $ 277 UAV最大通信资源块数 $ {d}_{0} $ 1 m 参考距离 Lanes 3 观测车道数 $ {N}_{0} $ –169 dBm/Hz 噪声功率密度 Length 1200 m观测车道以及UAV准飞区域长度 f 2.5 GHz 单位计算资源块算力 Width 3.75 m 观测车道以及UAV准飞区域宽度 $ {p}_{n} $ 30 dBm MU发射功率 H 100 m UAV飞行高度 $ {b}^\text{sub} $ 30 kHz 单位通信资源块带宽 $ E_{\text{UAV}}^{\max } $ 500 kJ UAV最大电池容量 $ D $ 9 ~18 Mbits MU计算任务大小 $ E_{\text{UAV}}^{\text{back}} $ 100 kJ UAV安全电量水平 $ \kappa $ 2000~ 3000 cycle/bitMU计算任务复杂度 $ {P}_{1} $ 200 W 悬停桨叶轮廓功率常数 $ {\varepsilon }_{\text{UAV}} $ 10^(–27) W·s3/cycle3 UAV计算功耗系数 $ {P}_{2} $ 4340 W悬停诱导功率常数 $ R_{\text{CAP}}^{\text{comp}} $ 50 每个CAP最大计算资源块数 $ {v}_{0} $ 8.92 m/s 悬停平均诱导速度 $ R_{\text{UAV}}^{\text{comp}} $ 50 UAV最大计算资源块数 $ {v}^{\max } $ 30 m/s UAV最大飞行速度 $ R_{\text{MU}}^{\text{comp}} $ 8 每个MU最大计算资源块数 A $ \left[-5,-4,\ldots ,5\right]\mathrm{m}/{\mathrm{s}}^{2} $ UAV加速度 -
[1] CHENG Kaijun and FANG Xuming. A cost efficient edge computing scheme in dual-band cooperative vehicular network[C]. 2023 IEEE Wireless Communications and Networking Conference (WCNC), Glasgow, United Kingdom, 2023: 1–6. doi: 10.1109/WCNC55385.2023.10118669. [2] 王汝言, 杨安琪, 吴大鹏, 等. 异步移动边缘计算网络中的联合任务调度与计算资源分配优化策略[J]. 电子与信息学报, 2025, 47(2): 470–479. doi: 10.11999/JEIT240685.WANG Ruyan, YANG Anqi, WU Dapeng, et al. Joint task scheduling and computing resource allocation optimization strategy in asynchronous mobile edge computing networks[J]. Journal of Electronics & Information Technology, 2025, 47(2): 470–479. doi: 10.11999/JEIT240685. [3] LIU Yanping, FANG Xuming, XIAO Ming, et al. Latency optimization for multi-UAV-assisted task offloading in air-ground integrated millimeter-wave networks[J]. IEEE Transactions on Wireless Communications, 2024, 23(10): 13359–13376. doi: 10.1109/TWC.2024.3400843. [4] WU Yu, FANG Xuming, MIN Geyong, et al. Intelligent offloading balance for vehicular edge computing and networks[J]. IEEE Transactions on Intelligent Transportation Systems, 2025, 26(5): 5792–5803. doi: 10.1109/TITS.2025.3549493. [5] 杨守义, 成昊泽, 党亚萍. 基于集群协作的云雾混合计算资源分配和负载均衡策略[J]. 电子与信息学报, 2023, 45(7): 2423–2431. doi: 10.11999/JEIT220719.YANG Shouyi, CHENG Haoze, and DANG Yaping. Resource allocation and load balancing strategy in cloud-fog hybrid computing based on cluster-collaboration[J]. Journal of Electronics & Information Technology, 2023, 45(7): 2423–2431. doi: 10.11999/JEIT220719. [6] DENG Cailian, FANG Xuming, and WANG Xianbin. UAV-enabled mobile-edge computing for AI applications: Joint model decision, resource allocation, and trajectory optimization[J]. IEEE Internet of Things Journal, 2023, 10(7): 5662–5675. doi: 10.1109/JIOT.2022.3151619. [7] YAN Xuezhen, FANG Xuming, DENG Cailian, et al. Joint optimization of resource allocation and trajectory control for mobile group users in fixed-wing UAV-enabled wireless network[J]. IEEE Transactions on Wireless Communications, 2024, 23(2): 1608–1621. doi: 10.1109/TWC.2023.3290748. [8] HE Long, SUN Geng, SUN Zemin, et al. An online joint optimization approach for QoE maximization in UAV-enabled mobile edge computing[C]. The IEEE INFOCOM 2024-IEEE Conference on Computer Communications, Vancouver, Canada, 2024: 101–110. doi: 10.1109/INFOCOM52122.2024.10621306. [9] 李斌, 蔡海晨, 赵传信, 等. 基于计算重用的无人机辅助边缘计算系统能耗优化[J]. 电子与信息学报, 2024, 46(7): 2740–2747. doi: 10.11999/JEIT231061.LI Bin, CAI Haichen, ZHAO Chuanxin, et al. Energy optimization for computing reuse in unmanned aerial vehicle-assisted edge computing systems[J]. Journal of Electronics & Information Technology, 2024, 46(7): 2740–2747. doi: 10.11999/JEIT231061. [10] ZHANG You and MAO Zhengchong. Computation offloading service in UAV-assisted mobile edge computing: A soft actor-critic approach[C]. 2023 International Conference on Ubiquitous Communication (Ucom), Xi’an, China, 2023: 373–378. doi: 10.1109/Ucom59132.2023.10257660. [11] GAO Yuan, DING Yu, WANG Ye, et al. Deep reinforcement learning-based trajectory optimization and resource allocation for secure UAV-enabled MEC networks[C]. The IEEE INFOCOM 2024-IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS), Vancouver, Canada, 2024: 01–05. doi: 10.1109/INFOCOMWKSHPS61880.2024.10620895. [12] CHEN Ying, YANG Yaozong, WU Yuan, et al. Joint trajectory optimization and resource allocation in UAV-MEC systems: A Lyapunov-assisted DRL approach[J]. IEEE Transactions on Services Computing, 2025, 18(2): 854–867. doi: 10.1109/TSC.2025.3544124. [13] YIN Baolin, FANG Xuming, and WANG Xianbin. Joint optimization of trajectory control, resource allocation, and user association based on DRL for multi-fixed-wing UAV networks[J]. IEEE Transactions on Wireless Communications, 2024, 23(10): 13330–13343. doi: 10.1109/TWC.2024.3400821. [14] YANG M, JEON S W, and KIM D K. Optimal trajectory for curvature-constrained UAV mobile base stations[J]. IEEE Wireless Communications Letters, 2020, 9(7): 1056–1059. doi: 10.1109/LWC.2020.2980281. [15] ICAO. Unmanned Aircraft Systems (UAS) Traffic Management (UTM). Doc 10049, 2023. (查阅网上资料, 未找到本条文献信息, 请核对). [16] YOU Changsheng and ZHANG Rui. 3D trajectory optimization in Rician fading for UAV-enabled data harvesting[J]. IEEE Transactions on Wireless Communications, 2019, 18(6): 3192–3207. doi: 10.1109/TWC.2019.2911939. [17] XU Yanke, GENG Qingbo, FEI Qing, et al. Research on UAV-assisted computation offloading based on PER-SAC[C]. 2024 China Automation Congress (CAC), Qingdao, China, 2024: 5672–5677. doi: 10.1109/CAC63892.2024.10865625. [18] ZENG Yong, XU Jie, and ZHANG Rui. Energy minimization for wireless communication with rotary-wing UAV[J]. IEEE Transactions on Wireless Communications, 2019, 18(4): 2329–2345. doi: 10.1109/TWC.2019.2902559. [19] CHEN Juan, XING Huanlai, XIAO Zhiwen, et al. A DRL agent for jointly optimizing computation offloading and resource allocation in MEC[J]. IEEE Internet of Things Journal, 2021, 8(24): 17508–17524. doi: 10.1109/JIOT.2021.3081694. -
下载:
下载: