基于参数化强化学习的车联网内容缓存和功率分配联合优化

雒江涛; 杨和平; 冉泳屹

doi:10.11999/JEIT220857

基于参数化强化学习的车联网内容缓存和功率分配联合优化

doi: 10.11999/JEIT220857

1.
重庆邮电大学通信与信息工程学院重庆 400065
2.
重庆邮电大学电子信息与网络工程研究院重庆 400065

基金项目: 国家自然科学基金(62171072, 62172064, 62003067)

详细信息

作者简介:
雒江涛：男，教授，博士生导师，研究方向为新一代网络技术、通信网络测试与优化、移动大数据等

杨和平：男，硕士生，研究方向为车联网

冉泳屹：男，讲师，硕士生导师，研究方向为绿色数据中心、卫星互联网等

通讯作者:
雒江涛　Luojt@cqupt.edu.cn

中图分类号: TN929.5
计量
- 文章访问数: 448
- HTML全文浏览量: 566
- PDF下载量: 91
- 被引次数: 0
出版历程
- 收稿日期: 2022-06-27
- 修回日期: 2022-11-16
- 网络出版日期: 2022-11-18
- 刊出日期: 2023-07-10

Joint Optimization of Content Caching and Power Distribution for Internet of Vehicles Based on Parametric Reinforcement Learning

1.
School of Communication and Information Engineering, Chongqing University of Posts and Telecommunications, Chongqing 400065, China
2.
Electronic Information and Networking Research Institute, Chongqing University of Posts and Telecommunications, Chongqing 400065, China

Funds: The National Natural Science Foundation of China (62171072, 62172064, 62003067)

摘要

摘要: 车联网场景下的业务内容具有海量和高度动态的特性，使得传统缓存机制无法较好地感知内容动态变化，且巨量接入设备与边缘缓存设备的有限资源之间的矛盾会引起系统时延性能差的问题。针对上述问题，该文提出一种基于强化学习的联合内容缓存和功率分配算法。首先，考虑联合优化内容缓存和功率分配，建立最小化系统整体时延的优化模型。其次，将该优化问题建模为马尔可夫决策过程(MDP)，并进一步将内容缓存和内容提供者的选择映射为离散动作集，并将功率分配映射为与离散动作相对应的连续参数。最后，借助参数化深度Q-Networks (P-DQN)算法求解这个具有离散-连续混合动作空间的问题。仿真结果表明，相较对比算法，该文所提算法能提高本地缓存命中率并降低系统传输时延。
- 车联网 /
- 内容缓存 /
- 功率分配 /
- 深度强化学习
Abstract: The service content in the Internet of Vehicles scenario is massive and highly dynamic, which makes the traditional caching mechanism unable to perceive better the dynamic changes of the content, and the contradiction between the huge number of access devices and the limited resources of edge cache devices will cause the problem of poor system latency performance. In view of the above problems, a reinforcement learning-based joint content caching and power allocation algorithm is proposed. First, considering the joint optimization of content caching and power allocation, an optimization model is established to minimize the overall system delay. Second, this optimization problem is modeled as a Markov Decision Process (MDP), and the selection of content caches and content providers is further mapped as discrete action sets, and power allocation is mapped as continuous parameters corresponding to discrete actions. Finally, this problem with a discrete-continuous mixed action space is solved with the aid of the Parametric Deep Q-Networks (P-DQN) algorithm. The simulation results show that the proposed algorithm can improve the local cache hit rate and reduce the system transmission delay compared with the comparison algorithms.
- Internet of vehicles /
- Content caching /
- Power distribution /
- Deep reinforcement learning

HTML全文

图 1 车联网内容缓存和交付模型

下载: 全尺寸图片幻灯片

图 2 P-DQN算法流程

下载: 全尺寸图片幻灯片

图 3 基于P-DQN和基于DDPG方案的收敛过程

下载: 全尺寸图片幻灯片

图 4 不同学习率下的平均时延

下载: 全尺寸图片幻灯片

图 5 不同mini-batch大小下的系统性能

下载: 全尺寸图片幻灯片

图 6 提出方案中内容交付途径的比例变化

下载: 全尺寸图片幻灯片

图 7 存储容量对缓存命中率的影响

下载: 全尺寸图片幻灯片

图 8 最大可分配功率对时延的影响

下载: 全尺寸图片幻灯片

算法1 基于P-DQN的联合优化算法
初始化：设置最大训练轮数 $T$ 、学习率 $\{ \alpha ,\beta \}$ 、探索参数 $\varepsilon$ 、概　　　　　率分布参数 $\xi$ 、mini-batch大小为 $B$ 、经验回放池 $\varGamma$ 、　　　　　网络权重 ${\varphi _1}$ 和 ${\theta _1}$
1: for $t = 1$ to $T$ do
2: for $k = 1$ to $K$ do
3: 　计算动作参数 ${p_d} \leftarrow {p_d}({s_t};{\theta _t})$ 。
4: 　使用 $\varepsilon$ -greedy策略选择动作 ${a_t} = ({d_t},{p_{{d_t}}})$ ，其中
5: 　　 ${d_t} = \arg {\max _{d \in D}}Q({s_t},d,{p_d};{\varphi _t})$
6: 　　 ${a}_{t}=\left\{\begin{array}{lllll}以概率\xi 采样,& \varepsilon \\ ({d}_{t},{p}_{ {d}_{t} }),& 1-\varepsilon \end{array} \right.$
7: 　执行 ${a_t}$ ，并获取时延和命中率，观测奖励 ${r_t}$ 和下一状态 ${s_{t + 1}}$
8: 　将 $[{s_t},{a_t},{r_t},{s_{t + 1}}]$ 存入 $\varGamma$
9:　从 $\varGamma$ 中采集 $B$ 个 ${\{ {s_b},{a_b},{r_b},{s_{b + 1}}\} _{b \in [B]}}$ 样本
10: ${y_b} = {r_b} + {\max _{d \in D}}\varphi Q({s_{b + 1}},d,{p_d}({s_{b + 1}};{\theta _t});{\varphi _t})$
11: 使用 ${\{ {y_b},{s_b},{a_b}\} _{b \in [B]}}$ 计算 ${\nabla _\varphi }\ell _t^Q(\varphi )$ 和 ${\nabla _\theta }\ell _t^\varTheta (\theta )$
12: 计算 ${\varphi _{t + 1}} \leftarrow {\varphi _t} - {\alpha _t}{\nabla _\varphi }\ell _t^Q(\varphi )$ 和 ${\theta _{t + 1} } \leftarrow {\theta _t} - {\beta _t}{\nabla _\theta }\ell _t^\varTheta (\theta )$
13: end for
14: end for

下载: 导出CSV

表 1 仿真参数

参数	数值
RSU覆盖半径 (m)	250
RSU数量	4
RSU存储容量 (GB)	16
RSU总功率 (dBm)	40
内容大小 (MB)	[8,12]
车辆数量	100
带宽 (MHz)	10
噪声功率 (dBm)	–60
SINR门限 (dB)	20
路径损耗模型	128.1+37.61lg(d)
$\mathrm{p}$ 和Q网络的隐藏层	128×64
mini-batch大小	64
经验回放池容量	5000
学习率 $\alpha = \beta$	0.001
折扣因子 $\gamma$	0.95

下载: 导出CSV

参考文献(16)

[1]	YOUSEFPOUR A, ISHIGAKI G, GOUR R, et al. On reducing IoT service delay via fog offloading[J]. IEEE Internet of Things Journal, 2018, 5(2): 998–1010. doi: 10.1109/JIOT.2017.2788802
[2]	HE Ying, ZHAO Nan, and YIN Hongxi. Integrated networking, caching, and computing for connected vehicles: A deep reinforcement learning approach[J]. IEEE Transactions on Vehicular Technology, 2018, 67(1): 44–55. doi: 10.1109/TVT.2017.2760281
[3]	TANG Fengxiao, MAO Bomin, KATO N, et al. Comprehensive survey on machine learning in vehicular network: Technology, applications and challenges[J]. IEEE Communications Surveys & Tutorials, 2021, 23(3): 2027–2057. doi: 10.1109/COMST.2021.3089688
[4]	XU Lianming, YANG Zexuan, WU Huaqing, et al. Socially driven joint optimization of communication, caching, and computing resources in vehicular networks[J]. IEEE Transactions on Wireless Communications, 2022, 21(1): 461–476. doi: 10.1109/TWC.2021.3096881
[5]	KAZMI S M A, DANG T N, YAQOOB I, et al. Infotainment enabled smart cars: A joint communication, caching, and computation approach[J]. IEEE Transactions on Vehicular Technology, 2019, 68(9): 8408–8420. doi: 10.1109/TVT.2019.2930601
[6]	ZHANG Shan, LI Junjie, LUO Hongbin, et al. Towards fresh and low-latency content delivery in vehicular networks: An edge caching aspect[C]. 2018 10th International Conference on Wireless Communications and Signal Processing, Hangzhou, China, 2018: 1–6.
[7]	CHEN Jiayin, WU Huaqing, YANG Peng, et al. Cooperative edge caching with location-based and popular contents for vehicular networks[J]. IEEE Transactions on Vehicular Technology, 2020, 69(9): 10291–10305. doi: 10.1109/TVT.2020.3004720
[8]	SUN Yaohua, PENG Mugen, and MAO Shiwen. A game-theoretic approach to cache and radio resource management in fog radio access networks[J]. IEEE Transactions on Vehicular Technology, 2019, 68(10): 10145–10159. doi: 10.1109/TVT.2019.2935098
[9]	YU Zhengxin, HU Jia, MIN Geyong, et al. Proactive content caching for internet-of-vehicles based on peer-to-peer federated learning[C]. 2020 IEEE 26th International Conference on Parallel and Distributed Systems, Hong Kong, China, 2020: 601–608.
[10]	XING Yuping, SUN Yanhua, QIAO Lan, et al. Deep reinforcement learning for cooperative edge caching in vehicular networks[C]. 2021 13th International Conference on Communication Software and Networks, Chongqing, China, 2021: 144–149.
[11]	QIAO Guanhua, LENG Supeng, MAHARJAN S, et al. Deep reinforcement learning for cooperative content caching in vehicular edge computing and networks[J]. IEEE Internet of Things Journal, 2020, 7(1): 247–257. doi: 10.1109/JIOT.2019.2945640
[12]	DAI Yueyue, XU Du, LU Yunlong, et al. Deep reinforcement learning for edge caching and content delivery in internet of vehicles[C]. 2019 IEEE/CIC International Conference on Communications in China, Changchun, China, 2019: 134–139.
[13]	CHEN Shuangwu, YAO Zhen, JIANG Xiaofeng, et al. Multi-agent deep reinforcement learning-based cooperative edge caching for ultra-dense next-generation networks[J]. IEEE Transactions on Communications, 2021, 69(4): 2441–2456. doi: 10.1109/TCOMM.2020.3044298
[14]	CHETLUR V V and DHILLON H S. Coverage and rate analysis of downlink cellular vehicle-to-everything (C-V2X) communication[J]. IEEE Transactions on Wireless Communications, 2020, 19(3): 1738–1753. doi: 10.1109/TWC.2019.2957222
[15]	ASHERALIEVA A and NIYATO D. Game theory and lyapunov optimization for cloud-based content delivery networks with device-to-device and UAV-enabled caching[J]. IEEE Transactions on Vehicular Technology, 2019, 68(10): 10094–10110. doi: 10.1109/TVT.2019.2934027
[16]	HAO Linchun, REN Pinyi, and DU Qinghe. Satellite QoS routing algorithm based on energy aware and load balancing[C]. 2020 International Conference on Wireless Communications and Signal Processing, Nanjing, China, 2020: 685–690.