基于异步优势演员-评论家学习的服务功能链资源分配算法

唐伦; 贺小雨; 王晓; 谭颀; 胡彦娟; 陈前斌

doi:10.11999/JEIT200287

基于异步优势演员-评论家学习的服务功能链资源分配算法

doi: 10.11999/JEIT200287 cstr: 32379.14.JEIT200287

1.
重庆邮电大学通信与信息工程学院重庆 400065
2.
重庆邮电大学移动通信重点实验室重庆 400065

基金项目: 重庆市教委科学技术研究项目(KJZD-M20180601)，重庆市重大主题专项(cstc2019jscx-zdztzxX0006)

详细信息

作者简介:
唐伦：男，1973年生，教授，博士生导师，主要研究方向为新一代无线通信网络、异构蜂窝网络等

贺小雨：女，1995年生，硕士生，研究方向为网络切片资源分配和强化学习

王晓：男，1995年生，硕士生，研究方向为网络切片资源优化和机器学习

谭颀：女，1995年生，硕士生，研究方向为5G网络切片、资源分配、随机优化理论

胡彦娟：女，1992年生，硕士生，研究方向为移动边缘计算中的资源分配和任务卸载

陈前斌：男，1967年生，教授，博士生导师，主要研究方向为个人通信、多媒体信息处理与传输、下一代移动通信网络、异构蜂窝网络等

通讯作者:
贺小雨　Hexy1995@163.com

中图分类号: TN929.5
计量
- 文章访问数: 1589
- HTML全文浏览量: 647
- PDF下载量: 79
- 被引次数: 0
出版历程
- 收稿日期: 2020-04-21
- 修回日期: 2020-09-28
- 网络出版日期: 2020-09-30
- 刊出日期: 2021-06-18

Resource allocation Algorithm of Service Function Chain Based on Asynchronous Advantage Actor-Critic Learning

1.
School of Communication and Information Engineering, Chongqing University of Posts and Telecommunications, Chongqing 400065, China
2.
Key Laboratory of Mobile Communication, Chongqing University of Posts and Telecommunications, Chongqing 400065, China

Funds: The Science and Technology Research Program of Chongqing Municipal Education Commission (KJZD-M20180601), The Major Theme Special Projects of Chongqing (cstc2019jscx-zdztzxX0006)

摘要

摘要: 考虑网络全局信息难以获悉的实际情况，针对接入网切片场景下用户终端(UE)的移动性和数据包到达的动态性导致的资源分配优化问题，该文提出了一种基于异步优势演员-评论家(A3C)学习的服务功能链(SFC)资源分配算法。首先，该算法建立基于区块链的资源管理机制，通过区块链技术实现可信地共享并更新网络全局信息，监督并记录SFC资源分配过程。然后，建立UE移动和数据包到达时变情况下的无线资源、计算资源和带宽资源联合分配的时延最小化模型，并进一步将其转化为马尔科夫决策过程(MDP)。最后，在所建立的MDP中采用A3C学习方法，实现资源分配策略的求解。仿真结果表明，该算法能够更加合理高效地利用资源，优化系统时延并保证UE需求。
- 网络切片 /
- 服务功能链资源分配 /
- 马尔科夫决策过程 /
- 异步优势演员-评论家学习 /
- 区块链
Abstract: Considering the fact that global network information is hard to obtain, and the slice resource allocation optimization problem caused by mobility of User Equipment (UE) and dynamics of packet arrival in the radio access network slice, a Service Function Chain(SFC)resource allocation algorithm based on Asynchronous Advantage Actor-Critic (A3C) learning is proposed. Firstly, a resource management mechanism based on blockchain technology is established, which can credibly share and update the global network information, also supervise and record SFC resource allocation process. Then, a delay minimization model based on joint allocation of radio resources, computing resources and bandwidth resources is built under the circumstance of UE moving and time-varying packet arrival, and further transformed into an Markov Decision Process(MDP) problem. At last, A3C learning method is adopted to obtain the resource allocation optimization strategy in this MDP. Simulation results show that the proposed algorithm could utilize resources more efficiently to optimize the system delay while guarantee the requirement of each UE.
- Network slice /
- Service Function Chain(SFC) resource allocation /
- Markov Decision Process(MDP) /
- Asynchronous Advantage Actor-Critic(A3C) learning /
- Blockchain

HTML全文

图 1 接入网切片SFC资源分配框架

下载: 全尺寸图片幻灯片

图 2 SFC数目与区块链共识时延关系图

下载: 全尺寸图片幻灯片

图 3 区块链节点CPU使用率

下载: 全尺寸图片幻灯片

图 4 不同熵超参数$\delta $的A3C算法收敛性

下载: 全尺寸图片幻灯片

图 5 不同学习算法的资源使用方差百分比

下载: 全尺寸图片幻灯片

参考文献(20)

[1]	OTOKURA M, LEIBNITZ K, KOIZUMI Y, et al. Evolvable virtual network function placement method: Mechanism and performance evaluation[J]. IEEE Transactions on Network and Service Management, 2019, 16(1): 27–40. doi: 10.1109/TNSM.2018.2890273
[2]	CABALLERO P, BANCHS A, DE VECIANA G, et al. Network slicing games: Enabling customization in multi-tenant mobile networks[J]. IEEE/ACM Transactions on Networking, 2019, 27(2): 662–675. doi: 10.1109/TNET.2019.2895378
[3]	ALQERM I and SHIHADA B. Sophisticated online learning scheme for green resource allocation in 5G heterogeneous cloud radio access networks[J]. IEEE Transactions on Mobile Computing, 2018, 17(10): 2423–2437. doi: 10.1109/TMC.2018.2797166
[4]	DEMIR M S, SAIT S M, and UYSAL M. Unified resource allocation and mobility management technique using particle swarm optimization for VLC networks[J]. IEEE Photonics Journal, 2018, 10(6): 7908809. doi: 10.1109/JPHOT.2018.2864139
[5]	DASTGHEIB M A, BEYRANVAND H, SALEHI J A, et al. Mobility-aware resource allocation in VLC networks using T-step look-ahead policy[J]. Journal of Lightwave Technology, 2018, 36(23): 5358–5370. doi: 10.1109/JLT.2018.2872869
[6]	唐伦, 周钰, 谭颀, 等. 基于强化学习的5G网络切片虚拟网络功能迁移算法[J]. 电子与信息学报, 2020, 42(3): 669–677. doi: 10.11999/JEIT190290 TANG Lun, ZHOU Yu, TAN Qi, et al. Virtual network function migration algorithm based on reinforcement learning for 5G network slicing[J]. Journal of Electronics &Information Technology, 2020, 42(3): 669–677. doi: 10.11999/JEIT190290
[7]	SHARMA P K, CHEN M Y, and PARK J H. A software defined fog node based distributed blockchain cloud architecture for IoT[J]. IEEE Access, 2017, 6: 115–124. doi: 10.1109/ACCESS.2017.2757955
[8]	XIE Lixia, DING Ying, YANG Hongyu, et al. Blockchain-based secure and trustworthy Internet of Things in SDN-enabled 5G-VANETs[J]. IEEE Access, 2019, 7: 56656–56666. doi: 10.1109/ACCESS.2019.2913682
[9]	SUN Yao, FENG Gang, QIN Shuang, et al. The SMART handoff policy for millimeter wave heterogeneous cellular networks[J]. IEEE Transactions on Mobile Computing, 2018, 17(6): 1456–1468. doi: 10.1109/TMC.2017.2762668
[10]	LI Junling, SHI Weisen, ZHANG Ning, et al. Reinforcement learning based VNF scheduling with end-to-end delay guarantee[C]. 2019 IEEE/CIC International Conference on Communications in China (ICCC), Changchun, China, 2019: 572–577. doi: 10.1109/ICCChina.2019.8855889.
[11]	LI Guanglei, ZHOU Huachun, FENG Bohao, et al. Efficient provision of service function chains in overlay networks using reinforcement learning[J]. IEEE Transactions on Cloud Computing, To be pulished. doi: 10.1109/TCC.2019.2961093
[12]	GRONDMAN I, BUSONIU L, LOPES G A D, et al. A survey of actor-critic reinforcement learning: Standard and natural policy gradients[J]. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews) , 2012, 42(6): 1291–1307. doi: 10.1109/TSMCC.2012.2218595
[13]	朱立, 俞欢, 詹士潇, 等. 高性能联盟区块链技术研究[J]. 软件学报, 2019, 30(6): 1577–1593. doi: 10.13328/j.cnki.jos.005737 ZHU Li, YU Huan, ZHAN Shixiao, et al. Research on high-performance consortium blockchain technology[J]. Journal of Software, 2019, 30(6): 1577–1593. doi: 10.13328/j.cnki.jos.005737
[14]	KIAYIAS A, RUSSELL A, DAVID B, et al. Ouroboros: A provably secure proof-of-stake blockchain protocol[C]. The 37th Annual International Cryptology Conference, Santa Barbara, USA, 2017: 357–388. doi: 10.1007/978-3-319-63688-7_12.
[15]	YAO Yingying, CHANG Xiaolin, MIŠIĆ J, et al. BLA: Blockchain-assisted lightweight anonymous authentication for distributed vehicular fog services[J]. IEEE Internet of Things Journal, 2019, 6(2): 3775–3784. doi: 10.1109/JIOT.2019.2892009
[16]	CHEN Zhonglin, CHEN Shanzhi, XU Hui, et al. A security authentication scheme of 5G ultra-dense network based on block chain[J]. IEEE Access, 2018, 6: 55372–55379. doi: 10.1109/ACCESS.2018.2871642
[17]	HE Li and HOU Zhixin. An improvement of consensus fault tolerant algorithm applied to alliance chain[C]. The IEEE 9th International Conference on Electronics Information and Emergency Communication (ICEIEC), Beijing, China, 2019: 1–4. doi: 10.1109/ICEIEC.2019.8784495.
[18]	GUO Shaoyong, DAI Yao, XU Siya, et al. Trusted cloud-edge network resource management: DRL-driven service function chain orchestration for IoT[J]. IEEE Internet of Things Journal, 2020, 7(7): 6010–6022. doi: 10.1109/JIOT.2019.2951593
[19]	WEI Qinglai, WANG Lingxiao, LIU Yu, et al. Optimal elevator group control via deep asynchronous actor-critic learning[J]. IEEE Transactions on Neural Networks and Learning Systems, 2020, 31(12): 5245–5256. doi: 10.1109/TNNLS.2020.2965208
[20]	戴鹏. 基于实用拜占庭共识算法(PBFT)的区块链模型的评估与改进[D]. [硕士论文], 北京邮电大学, 2019. DAI Peng. Evalution and research of blockchain model based on practical byzantine consensus algorithm (PBFT)[D]. [Master dissertation], Beijing University of Posts and Telecommunications, 2019.