基于迁移演员-评论家学习的服务功能链部署算法

唐伦; 贺小雨; 王晓; 陈前斌

doi:10.11999/JEIT190542

基于迁移演员-评论家学习的服务功能链部署算法

doi: 10.11999/JEIT190542

1.
重庆邮电大学通信与信息工程学院重庆 400065
2.
重庆邮电大学移动通信技术重点实验室重庆 400065

基金项目: 国家自然科学基金(61571073)，重庆市教委科学技术研究项目(KJZD-M20180601)

详细信息

作者简介:
唐伦：男，1973年生，教授，博士生导师，主要研究方向为新一代无线通信网络、异构蜂窝网络等

贺小雨：女，1995年生，硕士生，研究方向为网络切片资源分配和强化学习

王晓：男，1995年生，硕士生，研究方向为网络切片资源优化和机器学习

陈前斌：男，1967年生，教授，博士生导师，主要研究方向为个人通信、多媒体信息处理与传输、下一代移动通信网络、异构蜂窝网络等

通讯作者:
贺小雨　Hexy1995@163.com

中图分类号: TN915
计量
- 文章访问数: 2511
- HTML全文浏览量: 1232
- PDF下载量: 97
- 被引次数: 0
出版历程
- 收稿日期: 2019-07-18
- 修回日期: 2020-03-07
- 网络出版日期: 2020-04-08
- 刊出日期: 2020-11-16

Deployment Algorithm of Service Function Chain Based on Transfer Actor-Critic Learning

1.
School of Communication and Information Engineering, Chongqing University of Posts and Telecommunications, Chongqing 400065, China
2.
Key Laboratory of Mobile Communication, Chongqing University of Posts and Telecommunications, Chongqing 400065, China

Funds: The National Natural Science Foundation of China (61571073), The Science and Technology Research Program of Chongqing Municipal Education Commission (KJZD-M20180601)

摘要

摘要: 针对5G网络切片环境下由于业务请求的随机性和未知性导致的资源分配不合理从而引起的系统高时延问题，该文提出了一种基于迁移演员-评论家(A-C)学习的服务功能链(SFC)部署算法(TACA)。首先，该算法建立基于虚拟网络功能放置、计算资源、链路带宽资源和前传网络资源联合分配的端到端时延最小化模型，并将其转化为离散时间马尔可夫决策过程(MDP)。而后，在该MDP中采用A-C学习算法与环境进行不断交互动态调整SFC部署策略，优化端到端时延。进一步，为了实现并加速该A-C算法在其他相似目标任务中(如业务请求到达率普遍更高)的收敛过程，采用迁移A-C学习算法实现利用源任务学习的SFC部署知识快速寻找目标任务中的部署策略。仿真结果表明，该文所提算法能够减小且稳定SFC业务数据包的队列积压，优化系统端到端时延，并提高资源利用率。
- 网络切片 /
- 服务功能链部署 /
- 马尔可夫决策过程 /
- 演员-评论家学习 /
- 迁移学习
Abstract: To solve the problem of high system delay caused by unreasonable resource allocation because of randomness and unpredictability of service requests in 5G network slicing, this paper proposes a deployment scheme of Service Function Chain (SFC) based on Transfer Actor-Critic (A-C) Algorithm (TACA). Firstly, an end-to-end delay minimization model is built based on Virtual Network Function (VNF) placement, and joint allocation of computing resources, link resources and fronthaul bandwidth resources, then the model is transformed into a discrete-time Markov Decision Process (MDP). Next, A-C learning algorithm is adopted in the MDP to adjust dynamically SFC deployment scheme by interacting with environment, so as to optimize the end-to-end delay. Furthermore, in order to realize and accelerate the convergence of the A-C algorithm in similar target tasks (such as the arrival rate of service requests is generally higher), the transfer A-C algorithm is adopted to utilize the SFC deployment knowledge learned from source tasks to find quickly the deployment strategy in target tasks. Simulation results show that the proposed algorithm can reduce and stabilize the queuing length of SFC packets, optimize the system end-to-end delay, and improve resource utilization.
- Network slice /
- Service Function Chain (SFC) deployment /
- Markov Decision Process (MDP) /
- Actor-Critic (A-C) learning /
- Transfer learning

HTML全文

图 1 系统架构

下载: 全尺寸图片幻灯片

图 2 A-C学习框架

下载: 全尺寸图片幻灯片

图 3 不同演员学习率A-C算法的收敛性

下载: 全尺寸图片幻灯片

图 4 不同评论家学习率A-C算法的收敛性

下载: 全尺寸图片幻灯片

图 5 基于不同优化器的A-C算法的收敛性

下载: 全尺寸图片幻灯片

图 6 3种切片的数据包到达率与队列积压和变化对照图

下载: 全尺寸图片幻灯片

图 7 3个切片的VNF放置方式选择统计图

下载: 全尺寸图片幻灯片

图 8 不同算法的系统收敛时延

下载: 全尺寸图片幻灯片

图 9 不同算法的资源利用率

下载: 全尺寸图片幻灯片

图 10 不同迁移率因子的TACA算法收敛过程

下载: 全尺寸图片幻灯片

表 1 基于迁移A-C学习的SFC部署算法

输入：高斯策略${ {\pi} _\theta }(s,a)\sim N(\mu (s),{\sigma ^2})$，以及其梯度${{\text{∇}} _\theta }\ln { {\pi} _\theta }(s,a)$，状态分布${d^{\pi} }(s)$，学习率${\varepsilon _{a,t}}$和${\varepsilon _{c,t}}$，折扣因子$\beta $
(1) for ${\rm{epsoide } }= 0,1,2, ··· ,E{p_{\max} }$ do
(2) 初始化：策略参数向量${{{\theta }}_t}$，状态-动作值函数参数向量${\omega _t}$，状态值函数参数向量${{{\upsilon}} _t}$，初始状态${s_0}\sim{d_{\pi} }(s)$，本地部署策略${\pi} _\theta ^n(s,a)$，外　　来迁移部署策略${\pi} _\theta ^e(s,a)$
(3) for 回合每一步$t = 0,1, ··· ,T$do
(4) 由式(20)得到整体部署策略，遵循整体策略${ {\pi} _\theta }(s,a)$选择动作${a^{(t)}}$，进行VNF放置和资源分配，而后更新环境状态${s^{(t + 1)}}$，并得到立即　　奖励${R_t} = - \tau (t)$
(5) end for
(6) 评论家过程：
(a) 计算相容特征：由式(10)得处于状态$s$的基函数向量，结合式(14)，式(15)得相容特征
(b) 相容近似：由式(11)得状态-动作值函数近似，由式(16)得状态值函数近似
(c) TD误差计算：由式(12)，式(17)分别得状态-动作值函数、状态值函数的TD误差
(d) 更新评论家参数：由式(13)得状态-动作值函数参数向量更新，由式(18)得状态值函数参数向量更新
(7) 演员过程：
(a) 计算优势函数
(b) 重写策略梯度：代入优势函数由式(19)得策略梯度
(c) 更新演员参数：由式(8)得策略参数向量更新
(8) end for

下载: 导出CSV

参考文献(17)

AGARWAL S, MALANDRINO F, CHIASSERINI C F, et al. VNF placement and resource allocation for the support of vertical services in 5G networks[J]. IEEE/ACM Transactions on Networking, 2019, 27(1): 433–446. doi: 10.1109/TNET.2018.2890631

史久根, 张径, 徐皓, 等. 一种面向运营成本优化的虚拟网络功能部署和路由分配策略[J]. 电子与信息学报, 2019, 41(4): 973–979. doi: 10.11999/JEIT180522

SHI Jiugen, ZHANG Jing, XU Hao, et al. Joint optimization of virtualized network function placement and routing allocation for operational expenditure[J]. Journal of Electronics &Information Technology, 2019, 41(4): 973–979. doi: 10.11999/JEIT180522

LI Defang, HONG Peilin, XUE Kaiping, et al. Virtual network function placement considering resource optimization and SFC requests in cloud datacenter[J]. IEEE Transactions on Parallel and Distributed Systems, 2018, 29(7): 1664–1677. doi: 10.1109/TPDS.2018.2802518

PEI Jianing, HONG Peilin, and LI Defang. Virtual network function selection and chaining based on deep learning in SDN and NFV-Enabled networks[C]. 2018 IEEE International Conference on Communications Workshops, Kansas City, USA, 2018: 1–6. doi: 10.1109/ICCW.2018.8403657.

CAI Yibin, WANG Ying, ZHONG Xuxia, et al. An approach to deploy service function chains in satellite networks[C]. NOMS 2018–2018 IEEE/IFIP Network Operations and Management Symposium, Taipei, China, 2018: 1–7. doi: 10.1109/NOMS.2018.8406159.

QU Long, ASSI C, and SHABAN K. Delay-aware scheduling and resource optimization with network function virtualization[J]. IEEE Transactions on Communications, 2016, 64(9): 3746–3758. doi: 10.1109/TCOMM.2016.2580150

陈前斌, 杨友超, 周钰, 等. 基于随机学习的接入网服务功能链部署算法[J]. 电子与信息学报, 2019, 41(2): 417–423. doi: 10.11999/JEIT180310

CHEN Qianbin, YANG Youchao, ZHOU Yu, et al. Deployment algorithm of service function chain of access network based on stochastic learning[J]. Journal of Electronics &Information Technology, 2019, 41(2): 417–423. doi: 10.11999/JEIT180310

PHAN T V, BAO N K, KIM Y, et al. Optimizing resource allocation for elastic security VNFs in the SDNFV-enabled cloud computing[C]. 2017 International Conference on Information Networking, Da Nang, Vietnam, 2017: 163–166. doi: 10.1109/ICOIN.2017.7899497.

XIA Weiwei and SHEN Lianfeng. Joint resource allocation using evolutionary algorithms in heterogeneous mobile cloud computing networks[J]. China Communications, 2018, 15(8): 189–204. doi: 10.1109/CC.2018.8438283

ZHU Zhengfa, PENG Jun, GU Xin, et al. Fair resource allocation for system throughput maximization in mobile edge computing[J]. IEEE Access, 2018, 6: 5332–5340. doi: 10.1109/ACCESS.2018.2790963

MAO Yuyi, ZHANG Jun, and LETAIEF K B. Dynamic computation offloading for mobile-edge computing with energy harvesting devices[J]. IEEE Journal on Selected Areas in Communications, 2016, 34(12): 3590–3605. doi: 10.1109/JSAC.2016.2611964

MEHRAGHDAM S, KELLER M, and KARL H. Specifying and placing chains of virtual network functions[C]. The 3rd IEEE International Conference on Cloud Networking, Luxembourg, Luxembourg, 2014: 7–13. doi: 10.1109/CloudNet.2014.6968961.

HAGHIGHI A A, HEYDARI S S, and SHAHBAZPANAHI S. MDP modeling of resource provisioning in virtualized content-delivery networks[C]. The 25th IEEE International Conference on Network Protocols, Toronto, Canada, 2017: 1–6. doi: 10.1109/ICNP.2017.8117600.

GRONDMAN I, BUSONIU L, LOPES G A D, et al. A survey of actor-critic reinforcement learning: Standard and natural policy gradients[J]. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews) , 2012, 42(6): 1291–1307. doi: 10.1109/TSMCC.2012.2218595

LEE D H and LEE J J. Incremental receptive field weighted actor-critic[J]. IEEE Transactions on Industrial Informatics, 2013, 9(1): 62–71. doi: 10.1109/TII.2012.2209660

LI Rongpeng, ZHAO Zhifeng, CHEN Xianfu, et al. TACT: A transfer actor-critic learning framework for energy saving in cellular radio access networks[J]. IEEE Transactions on Wireless Communications, 2014, 13(4): 2000–2011. doi: 10.1109/TWC.2014.022014.130840

KOUSHI A M, HU Fei, and KUMAR S. Intelligent spectrum management based on transfer actor-critic learning for rateless transmissions in cognitive radio networks[J]. IEEE Transactions on Mobile Computing, 2018, 17(5): 1204–1215. doi: 10.1109/TMC.2017.2744620

施引文献

资源附件(0)

访问统计