Advanced Search
Turn off MathJax
Article Contents
YANG Miaoyan, FANG Xuming. UAV-assisted Mobile Edge Computing based on Hybrid Hierarchical DRL in the Internet of Vehicular[J]. Journal of Electronics & Information Technology. doi: 10.11999/JEIT250743
Citation: YANG Miaoyan, FANG Xuming. UAV-assisted Mobile Edge Computing based on Hybrid Hierarchical DRL in the Internet of Vehicular[J]. Journal of Electronics & Information Technology. doi: 10.11999/JEIT250743

UAV-assisted Mobile Edge Computing based on Hybrid Hierarchical DRL in the Internet of Vehicular

doi: 10.11999/JEIT250743 cstr: 32379.14.JEIT250743
  • Accepted Date: 2026-01-22
  • Rev Recd Date: 2026-01-22
  • Available Online: 2026-02-12
  •   Objective  In the internet of vehicle (IoV), utilizing unmanned aerial vehicle (UAV) to address the tidal wave of edge computing has become a key technology in the 6G field in recent years. However, when using deep reinforcement learning (DRL) to optimize system latency, the action space dimension grows exponentially with the number of vehicles, leading to training difficulties and slow convergence. Therefore, this paper proposes a two-layer hybrid solution for UAV-assisted mobile edge computing (MEC) based on DRL which called hybrid hierarchical deep reinforcement learning(HHDRL).  Methods  The proposed HHDRL algorithm employs a two-layer architecture to hierarchically solve complex optimization problems. The upper layer employs an agent based on proximal policy optimization (PPO) combined with a multi-head actor network to manage user offloading policy and UAV control policy. The N heads in this network handle offloading decisions for the N users (local processing, offloadi- -ng to associated CAPs or UAV). A UAV flight control head is responsible for selecting from a set of discrete acceleration actions to reflect actual control constraints. The lower layer employs a computation- -ally efficient greedy algorithm to prioritize resources based on task characteristics. This hybrid hierarchi- -cal approach avoids the high computational cost of resource allocation schemes based solely on DRL.  Results and Discussions  The performance of the proposed HHDRL scheme was verified through numerical simulations. The parameters used in the simulation include parameters related to the specific Rician fading channel, parameters related to the UAV flight energy consumption model, and system parameters(e.g., mission data size of 9-18 Mbits and mission complexity of 2000-3000 cycles/bit). Figure 3 shows a training convergence comparison between the HHDRL scheme and the original DRL algorithm, demonstrating that HHDRL consistently converges faster than the DRL scheme, despite achieving slightly lower final rewards compared to the pure DRL approach. Figure 4 illustrates the impact of the HHDRL architecture on user delay fairness; the comparison reveals that the introduction of the HHDRL framework does not compromise the user fairness performance inherent to the DRL method. The performance evaluation in Figure 5 shows that the proposed scheme reduces system latency by approximately 71%-91% compared to a random baseline, and 1%-12% compared to the original DRL algorithm. Figure 6 shows a training time analysis for different numbers of users. Across different numbers of users, the HHDRL scheme consistently has shorter training times than the DRL scheme. Furthermore, as the number of users increases, the HHDRL scheme's training time increases more slowly. This is attributed to the hybrid hierarchical algorithm network architecture, which simplifies the DRL output action space. When we replace the upper-layer algorithm from PPO with other DRL algorithm, we still outperform the random baseline, and achieve comparable performance to the non-hybrid-hierarchical approach. This demonstrates the effectiveness and universality of the hybrid hierarchical architecture in achieving significant training acceleration while maintaining performance. The system parameter sensitivity analysis in Figure 8 shows that computational resources have the most significant impact on latency performance, compared to user transmission power and system bandwidth. This is because computational latency typically accounts for a larger proportion than communication latency in task processing. Figure 9 shows the results of the UAV trajectory optimization. Figure 9(a) shows the change in the UAV's velocity over time, demonstrating that discrete acceleration control reflects actual control accuracy and response delay considerations rather than idealized instantaneous velocity changes. Figure 9(b) shows the X-coordinates of the UAV and user over time, illustrating that the UAV adaptively adjusts its position to match the changing user distribution while maintaining flight stability.  Conclusions  This paper proposes a HHDRL algorithm that integrates DRL with a greedy algorithm in a hierarchical framework to address the difficulty of training UAV-assisted MEC systems in IoV. Simulation results confirm that: (1) Compared with the DRL method, the proposed method significantly accelerates the training convergence speed and shortens the training time. (2) The system latency performance of the proposed algorithm is almost comparable to that of the pure DRL method, while significantly outperforming the heuristic baseline and random baseline algorithms. (3) The HHDRL framework is able to effectively manage user task offloading, computing node resource allocation, and joint optimization of UAV trajectories under practical operational constraints. Future work will extend the framework to apply to multi-UAV collaboration and consider more complex environments.
  • loading
  • [1]
    CHENG Kaijun and FANG Xuming. A cost efficient edge computing scheme in dual-band cooperative vehicular network[C]. Proceedings of 2023 IEEE Wireless Communications and Networking Conference (WCNC), Glasgow, United Kingdom, 2023: 1–6. doi: 10.1109/WCNC55385.2023.10118669.
    [2]
    王汝言, 杨安琪, 吴大鹏, 等. 异步移动边缘计算网络中的联合任务调度与计算资源分配优化策略[J]. 电子与信息学报, 2025, 47(2): 470–479. doi: 10.11999/JEIT240685.

    WANG Ruyan, YANG Anqi, WU Dapeng, et al. Joint task scheduling and computing resource allocation optimization strategy in asynchronous mobile edge computing networks[J]. Journal of Electronics & Information Technology, 2025, 47(2): 470–479. doi: 10.11999/JEIT240685.
    [3]
    LIU Yanping, FANG Xuming, XIAO Ming, et al. Latency optimization for multi-UAV-assisted task offloading in air-ground integrated millimeter-wave networks[J]. IEEE Transactions on Wireless Communications, 2024, 23(10): 13359–13376. doi: 10.1109/TWC.2024.3400843.
    [4]
    WU Yu, FANG Xuming, MIN Geyong, et al. Intelligent offloading balance for vehicular edge computing and networks[J]. IEEE Transactions on Intelligent Transportation Systems, 2025, 26(5): 5792–5803. doi: 10.1109/TITS.2025.3549493.
    [5]
    杨守义, 成昊泽, 党亚萍. 基于集群协作的云雾混合计算资源分配和负载均衡策略[J]. 电子与信息学报, 2023, 45(7): 2423–2431. doi: 10.11999/JEIT220719.

    YANG Shouyi, CHENG Haoze, and DANG Yaping. Resource allocation and load balancing strategy in cloud-fog hybrid computing based on cluster-collaboration[J]. Journal of Electronics & Information Technology, 2023, 45(7): 2423–2431. doi: 10.11999/JEIT220719.
    [6]
    DENG Cailian, FANG Xuming, and WANG Xianbin. UAV-enabled mobile-edge computing for AI applications: Joint model decision, resource allocation, and trajectory optimization[J]. IEEE Internet of Things Journal, 2023, 10(7): 5662–5675. doi: 10.1109/JIOT.2022.3151619.
    [7]
    YAN Xuezhen, FANG Xuming, DENG Cailian, et al. Joint optimization of resource allocation and trajectory control for mobile group users in fixed-wing UAV-enabled wireless network[J]. IEEE Transactions on Wireless Communications, 2024, 23(2): 1608–1621. doi: 10.1109/TWC.2023.3290748.
    [8]
    HE Long, SUN Geng, SUN Zemin, et al. An online joint optimization approach for QoE maximization in UAV-enabled mobile edge computing[C]. Proceedings of the IEEE INFOCOM 2024-IEEE Conference on Computer Communications, Vancouver, Canada, 2024: 101–110. doi: 10.1109/INFOCOM52122.2024.10621306.
    [9]
    李斌, 蔡海晨, 赵传信, 等. 基于计算重用的无人机辅助边缘计算系统能耗优化[J]. 电子与信息学报, 2024, 46(7): 2740–2747. doi: 10.11999/JEIT231061.

    LI Bin, CAI Haichen, ZHAO Chuanxin, et al. Energy optimization for computing reuse in unmanned aerial vehicle-assisted edge computing systems[J]. Journal of Electronics & Information Technology, 2024, 46(7): 2740–2747. doi: 10.11999/JEIT231061.
    [10]
    ZHANG You and MAO Zhengchong. Computation offloading service in UAV-assisted mobile edge computing: A soft actor-critic approach[C]. Proceedings of 2023 International Conference on Ubiquitous Communication (Ucom), Xi’an, China, 2023: 373–378. doi: 10.1109/Ucom59132.2023.10257660.
    [11]
    GAO Yuan, DING Yu, WANG Ye, et al. Deep reinforcement learning-based trajectory optimization and resource allocation for secure UAV-enabled MEC networks[C]. Proceedings of the IEEE INFOCOM 2024-IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS), Vancouver, Canada, 2024: 01–05. doi: 10.1109/INFOCOMWKSHPS61880.2024.10620895.
    [12]
    CHEN Ying, YANG Yaozong, WU Yuan, et al. Joint trajectory optimization and resource allocation in UAV-MEC systems: A Lyapunov-assisted DRL approach[J]. IEEE Transactions on Services Computing, 2025, 18(2): 854–867. doi: 10.1109/TSC.2025.3544124.
    [13]
    YIN Baolin, FANG Xuming, and WANG Xianbin. Joint optimization of trajectory control, resource allocation, and user association based on DRL for multi-fixed-wing UAV networks[J]. IEEE Transactions on Wireless Communications, 2024, 23(10): 13330–13343. doi: 10.1109/TWC.2024.3400821.
    [14]
    YANG M, JEON S W, and KIM D K. Optimal trajectory for curvature-constrained UAV mobile base stations[J]. IEEE Wireless Communications Letters, 2020, 9(7): 1056–1059. doi: 10.1109/LWC.2020.2980281.
    [15]
    ICAO. Unmanned Aircraft Systems (UAS) Traffic Management (UTM). Doc 10049, 2023. (查阅网上资料, 未找到本条文献信息, 请核对).
    [16]
    YOU Changsheng and ZHANG Rui. 3D trajectory optimization in Rician fading for UAV-enabled data harvesting[J]. IEEE Transactions on Wireless Communications, 2019, 18(6): 3192–3207. doi: 10.1109/TWC.2019.2911939.
    [17]
    XU Yanke, GENG Qingbo, FEI Qing, et al. Research on UAV-assisted computation offloading based on PER-SAC[C]. Proceedings of 2024 China Automation Congress (CAC), Qingdao, China, 2024: 5672–5677. doi: 10.1109/CAC63892.2024.10865625.
    [18]
    ZENG Yong, XU Jie, and ZHANG Rui. Energy minimization for wireless communication with rotary-wing UAV[J]. IEEE Transactions on Wireless Communications, 2019, 18(4): 2329–2345. doi: 10.1109/TWC.2019.2902559.
    [19]
    CHEN Juan, XING Huanlai, XIAO Zhiwen, et al. A DRL agent for jointly optimizing computation offloading and resource allocation in MEC[J]. IEEE Internet of Things Journal, 2021, 8(24): 17508–17524. doi: 10.1109/JIOT.2021.3081694.
  • 加载中

Catalog

    通讯作者: 陈斌, bchen63@163.com
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Figures(9)  / Tables(4)

    Article Metrics

    Article views (23) PDF downloads(0) Cited by()
    Proportional views
    Related

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return