基于改进深度强化学习的虚拟网络功能部署优化算法

唐伦; 贺兰钦; 连沁怡; 谭颀

doi:10.11999/JEIT200297

基于改进深度强化学习的虚拟网络功能部署优化算法

doi: 10.11999/JEIT200297 cstr: 32379.14.JEIT200297

唐伦^{1, 2},
贺兰钦^{1, 2, ,},
连沁怡³,
谭颀^{1, 2}

1.
重庆邮电大学通信与信息工程学院重庆 400065
2.
重庆邮电大学移动通信技术重点实验室重庆 400065
3.
三峡大学国际交流学院宜昌 443002

基金项目: 国家自然科学基金(62071078)，重庆市教委科学技术研究项目(KJZD-M201800601)，重庆市重大主题专项 (cstc2019jscx-zdztzxX0006)

详细信息

作者简介:
唐伦：男，1973年生，教授，博士，研究方向为下一代无线通信网络、异构蜂窝网络、软件定义无线网络等

贺兰钦：男，1995年生，硕士生，研究方向为5G网络切片、机器学习算法

谭颀：女，1995年生，硕士生，研究方向为5G网络切片、资源分配、随机优化理论

通讯作者:
贺兰钦　719097886@qq.com

中图分类号: TN929.5
计量
- 文章访问数: 1532
- HTML全文浏览量: 848
- PDF下载量: 171
- 被引次数: 0
出版历程
- 收稿日期: 2020-04-21
- 修回日期: 2021-01-22
- 网络出版日期: 2021-01-29
- 刊出日期: 2021-06-18

Virtual Network Function Placement Optimization Algorithm Based on Improve Deep Reinforcement Learning

Lun TANG^{1, 2},
Lanqin HE^{1, 2
, ,},
Qinyi LIAN³,
Qi TAN^{1, 2}

1.
School of Communication and Information Engineering, Chongqing University of Posts and Telecommunications, Chongqing 400065, China
2.
Key Laboratory of Mobile Communications Technology, Chongqing University of Posts and Telecommunications, Chongqing 400065, China
3.
College of International Communications, China Three Gorges University, Yichang 443002, China

Funds: The National Natural Science Foundation of China (62071078), The Science and Technology Research Program of Chongqing Municipal Education Commission (KJZD-M201800601), The Major Theme Special Projects of Chongqing (cstc2019jscx-zdztzxX0006)

摘要

摘要: 针对网络功能虚拟化/软件定义网络 (NFV/SDN)架构下，网络服务请求动态到达引起的服务功能链(SFC)部署优化问题，该文提出一种基于改进深度强化学习的虚拟网络功能(VNF)部署优化算法。首先，建立了马尔科夫决策过程 (MDP)的随机优化模型，完成SFC的在线部署以及资源的动态分配，该模型联合优化SFC部署成本和时延成本，同时受限于SFC的时延以及物理资源约束。其次，在VNF部署和资源分配的过程中，存在状态和动作空间过大，以及状态转移概率未知等问题，该文提出了一种基于深度强化学习的VNF智能部署算法，从而得到近似最优的VNF部署策略和资源分配策略。最后，针对深度强化学习代理通过ε贪婪策略进行动作探索和利用，造成算法收敛速度慢等问题，提出了一种基于值函数差异的动作探索和利用方法，并进一步采用双重经验回放池，解决经验样本利用率低的问题。仿真结果表示，该算法能够加快神经网络收敛速度，并且可以同时优化SFC部署成本和SFC端到端时延。
- 虚拟网络功能 /
- 深度强化学习 /
- 服务功能链端到端时延 /
- 服务功能链部署成本
Abstract: Considering the problem of Service Function Chain (SFC) placement optimization caused by the dynamic arrival of network service requests under the Network Function Virtualization/Software Defined Network (NFV/SDN) architecture, a Virtual Network Function (VNF) placement optimization algorithm based on improved deep reinforcement learning is proposed. Firstly, a stochastic optimization model of Markov Decision Process (MDP) is established to jointly optimizes SFC placement cost and delay cost, and is constrained by the delay of SFC, as well as the resources of common server Central Processing Unit (CPU) and physical link bandwidth. Secondly, in the process of VNF placement and resource allocation, there are problems such as too large state space, high dimension of action space, and unknown state transition probability. A VNF intelligent placement algorithm based on deep reinforcement learning is proposed to obtain an approximately optimal VNF placement strategy and resource allocation strategy. Finally, considering the problems of deep reinforcement learning agent's action exploration and utilization through ε greedy strategy, resulting in low learning efficiency and slow convergence speed, a method of action exploration and utilization based on the difference of value function is proposed, and further adopts dual experience playback pool to solve the problem of low utilization of empirical samples. Simulation results show that the algorithm can converge quickly, and it can optimize SFC placement cost and SFC end-to-end delay.
- Virtual Network Function(VNF) /
- Deep reinforcement learning /
- Service Function Chain (SFC) end-to-end delay /
- Service Function Chain (SFC) placement cost

HTML全文

图 1 系统模型

下载: 全尺寸图片幻灯片

图 2 改进深度强化学习算法框架

下载: 全尺寸图片幻灯片

图 3 损失函数对比

下载: 全尺寸图片幻灯片

图 4 系统总时延对比

下载: 全尺寸图片幻灯片

图 5 部署成本对比

下载: 全尺寸图片幻灯片

图 6 效用对比

下载: 全尺寸图片幻灯片

表 1 网络场景的仿真参数

仿真参数	值	仿真参数	值
数据包到达过程	泊松过程${\lambda _i} = 2$	数据包大小	500 kByte/packet
通用服务器总台数$H$	6台	物理链路带宽资源	640 MB
通用服务器$v$的CPU资源容量	8核	单个CPU服务速率$\beta $	25 MB/s
折扣因子$\gamma $	0.99	软更新因子$\tau $	0.01
最大迭代轮数	2000	学习率$\alpha $	$\left\{ {0.00001,0.0001} \right\}$
SFC的长度	Uniform[2,5]个	SFC的时延最长限制${D_i}$	30 ms
正数$\partial $	30	正数$\varsigma $	20

下载: 导出CSV

参考文献(15)

[1]	唐伦, 杨恒, 马润琳, 等. 基于5G接入网络的多优先级虚拟网络功能迁移开销与网络能耗联合优化算法[J]. 电子与信息学报, 2019, 41(9): 2079–2086. doi: 10.11999/JEIT180906 TANG Lun, YANG Heng, MA Runlin, et al. Multi-priority based joint optimization algorithm of virtual network function migration cost and network energy consumption[J]. Journal of Electronics &Information Technology, 2019, 41(9): 2079–2086. doi: 10.11999/JEIT180906
[2]	KUO T W, LIOU B H, LIN K C J, et al. Deploying chains of virtual network functions: On the relation between link and server usage[J]. IEEE/ACM Transactions on Networking, 2018, 26(4): 1562–1576. doi: 10.1109/TNET.2018.2842798
[3]	VIZARRETA P, CONDOLUCI M, MACHUCA C M, et al. QoS-driven function placement reducing expenditures in NFV deployments[C]. 2017 IEEE International Conference on Communications (ICC), Paris, France, 2017: 1–7. doi: 10.1109/ICC.2017.7996513.
[4]	XIONG Gang, HU Yuxiang, TIAN Le, et al. A virtual service placement approach based on improved quantum genetic algorithm[J]. Frontiers of Information Technology & Electronic Engineering, 2016, 17(7): 661–671. doi: 10.1631/FITEE.1500494
[5]	LUO Ziyue and WU Chuan. An online algorithm for VNF service chain scaling in datacenters[J]. IEEE/ACM Transactions on Networking, 2020, 28(3): 1061–1073. doi: 10.1109/TNET.2020.2979263
[6]	GHARBAOUI M, CONTOLI C, DAVOLI G, et al. Demonstration of latency-aware and self-adaptive service chaining in 5G/SDN/NFV infrastructures[C]. 2018 IEEE Conference on Network Function Virtualization and Software Defined Networks (NFV-SDN), Verona, Italy, 2018: 1–2. doi: 10.1109/NFV-SDN.2018.8725645.
[7]	CHENG Aolin, LI Jian, YU Yuling, et al. Delay-sensitive user scheduling and power control in heterogeneous networks[J]. IET Networks, 2015, 4(3): 175–184. doi: 10.1049/iet-net.2014.0026
[8]	YANG Jian, ZHANG Shuben, WU Xiaomin, et al. Online learning-based server provisioning for electricity cost reduction in data center[J]. IEEE Transactions on Control Systems Technology, 2017, 25(3): 1044–1051. doi: 10.1109/TCST.2016.2575801
[9]	唐伦, 杨恒, 赵国繁, 等. 基于时延感知的5G网络切片节点和链路映射算法[J]. 北京邮电大学学报, 2018, 41(6): 71–77. doi: 10.13190/j.jbupt.2018-018 TANG Lun, YANG Heng, ZHAO Guofan, et al. Delay-aware 5G network slicing node and link embedding algorithm[J]. Journal of Beijing University of Posts and Telecommunications, 2018, 41(6): 71–77. doi: 10.13190/j.jbupt.2018-018
[10]	WANG Zhuzhu, LIU Yang, MA Zhou, et al. LiPSG: lightweight privacy-preserving Q-learning-based energy management for the IoT-Enabled smart grid[J]. IEEE Internet of Things Journal, 2020, 7(5): 3935–3947. doi: 10.1109/JIOT.2020.2968631
[11]	TOKIC M. Adaptive ε-greedy exploration in reinforcement learning based on value differences[C]. The 33rd Annual German Conference on KI 2010: Advances in Artificial Intelligence, Karlsruhe, Germany, 2010: 203-210.
[12]	CAO Xi, WAN Huaiyu, LIN Youfang, et al. High-value prioritized experience replay for off-policy reinforcement learning[C]. 2019 IEEE 31st International Conference on Tools with Artificial Intelligence (ICTAI), Portland, USA, 2019: 1510–1514. doi: 10.1109/ICTAI.2019.00215.
[13]	陈卓, 冯钢, 刘蓓, 等. 运营商网络中面向时延优化的服务功能链迁移重配置策略[J]. 电子学报, 2018, 46(9): 2229–2237. doi: 10.3969/j.issn.0372-2112.2018.09.026 CHEN Zhuo, FENG Gang, LIU Bei, et al. Delay optimization oriented service function chain migration and re-deployment in operator network[J]. Acta Electronica Sinica, 2018, 46(9): 2229–2237. doi: 10.3969/j.issn.0372-2112.2018.09.026
[14]	LI Han, LÜ Tiejun, and ZHANG Xuewei. Deep deterministic policy gradient based dynamic power control for self-powered ultra-dense networks[C]. 2018 IEEE Globecom Workshops (GC Wkshps), Abu Dhabi, 2018: 1–6. doi: 10.1109/GLOCOMW.2018.8644157.
[15]	金明, 李琳琳, 张文瑾, 等. 基于深度强化学习的服务功能链映射算法[J]. 计算机应用研究, 2020, 37(11): 3456–3460, 3466. JIN Ming, LI Linlin, ZHANG Wenjin, et al. SFC mapping algorithm based on deep reinforcement learning[J]. Application Research of Computers, 2020, 37(11): 3456–3460, 3466.