高级搜索

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

基于多智能体柔性演员-评论家学习的服务功能链部署算法

唐伦 李师锐 杜雨聪 陈前斌

唐伦, 李师锐, 杜雨聪, 陈前斌. 基于多智能体柔性演员-评论家学习的服务功能链部署算法[J]. 电子与信息学报, 2023, 45(8): 2893-2901. doi: 10.11999/JEIT220803
引用本文: 唐伦, 李师锐, 杜雨聪, 陈前斌. 基于多智能体柔性演员-评论家学习的服务功能链部署算法[J]. 电子与信息学报, 2023, 45(8): 2893-2901. doi: 10.11999/JEIT220803
TANG Lun, LI Shirui, DU Yucong, CHEN Qianbin. Deployment Algorithm of Service Function Chain Based on Multi-Agent Soft Actor-Critic Learning[J]. Journal of Electronics & Information Technology, 2023, 45(8): 2893-2901. doi: 10.11999/JEIT220803
Citation: TANG Lun, LI Shirui, DU Yucong, CHEN Qianbin. Deployment Algorithm of Service Function Chain Based on Multi-Agent Soft Actor-Critic Learning[J]. Journal of Electronics & Information Technology, 2023, 45(8): 2893-2901. doi: 10.11999/JEIT220803

基于多智能体柔性演员-评论家学习的服务功能链部署算法

doi: 10.11999/JEIT220803
基金项目: 国家自然科学基金(62071078),重庆市教委科学技术研究项目(KJZD-M201800601),四川省科技计划项目(2021YFQ0053)
详细信息
    作者简介:

    唐伦:男,教授,博士生导师,研究方向为下一代无线通信网络、异构蜂窝网络、软件定义无线网络等

    李师锐:男,硕士生,研究方向为网络功能虚拟化、资源分配和强化学习

    杜雨聪:男,硕士生,研究方向为网络切片、资源分配和机器学习

    陈前斌:男,教授,博士生导师,研究方向为个人通信、多媒体信息处理与传输、下一代移动通信网络、异构蜂窝网络等

    通讯作者:

    李师锐 2819717062@qq.com

  • 中图分类号: TN929.5

Deployment Algorithm of Service Function Chain Based on Multi-Agent Soft Actor-Critic Learning

Funds: The National Natural Science Foundation of China (62071078), The Science and Technology Research Program of Chongqing Municipal Education Commission (KJZD-M201800601), Sichuan Science and Technology Program (2021YFQ0053)
  • 摘要: 针对网络功能虚拟化(NFV)架构下业务请求动态变化引起的服务功能链(SFC)部署优化问题,该文提出一种基于多智能体柔性演员-评论家(MASAC)学习的SFC部署优化算法。首先,建立资源负载惩罚、SFC部署成本和时延成本最小化的模型,同时受限于SFC端到端时延和网络资源预留阈值约束。其次,将随机优化问题转化为马尔可夫决策过程(MDP),实现SFC动态部署和资源的均衡调度,还进一步提出基于业务分工的多决策者编排方案。最后,在分布式多智能体系统中采用柔性演员-评论家(SAC)算法以增强探索能力,并引入了中央注意力机制和优势函数,能够动态和有选择性地关注获取更大部署回报的信息。仿真结果表明,所提算法可以实现负载惩罚、时延和部署成本的优化,并随业务请求量的增加能更好地扩展。
  • 图  1  系统架构

    图  2  多智能体业务编排与资源分配示意图

    图  3  软更新因子与收敛的关系

    图  4  注意力动静态与资源分配的关系

    图  5  资源使用方差对比

    图  6  平均时延对比

    图  7  网络惩罚对比

    图  8  各警戒值下两种网络超载惩罚

    算法1 基于多智能体柔性演员-评论家学习的SFC部署算法
     输入:多智能体数量$ N $,软更新因子$ \tau $,折扣因子$ \gamma $,温度参数$ \alpha $,注意力头数量$ h $,回放缓存池大小$ D $,回合数$ M $,回合最大长度$ T $
     输出:各智能体的策略
     (1) 初始化:$ E $个并行的环境,回放缓存池$ D $,${T_{{\rm{update}}} } \leftarrow 0$
     (2) for $ {i_{ep}} $= $1,2, \cdots ,M$ episodes do
     (3)  重置SFC部署的环境,初始化各决策者$ i $的观察$ o_i^e $
     (4)  for$ t $=$ 1,2, \cdots ,T $ do
     (5)  并行环境为决策者$ i $选取动作$a_i^e{\text{~} }{\pi _i}( \cdot {\text{|} }o_i^e)$,进行VNF放置和节点CPU、链路带宽资源分配
     (6)   所有决策者获得SFC部署的局部观察$o{'}_i^e$,得到VNF放置与资源分配的奖励$ r_i^e $
     (7)   if $ \text{C}1~\text{C9} $约束满足,在$ D $中储存各环境的转变
     (8)   ${T_{{\rm{update}}} }{\text{ = } }{T_{{\rm{update}}} }{\text{ + } }E$
     (9)   if ${T_{{\rm{update}}} } \ge$更新的最小步数,then
     (10)    for $ j = 1,2, \cdots ,{\text{num}} $ 评论家网络 更新 do
     (11)     从缓存池中打包小批次样本$ B $,$(o_{1 \cdots N}^B,a_{1 \cdots N}^B,r_{1 \cdots N}^B,o{'}_{1 \cdots N}^B) \leftarrow B$
     (12)     在并行环境中,由式(22)与式(23)计算各决策者的观察-动作值$ Q_i^\psi (o_{1 \cdots N}^B,a_{1 \cdots N}^B) $,通过目标策略网络计算$a{'}_i^B{\text{~} }\pi _i^{\overline \theta }(o{'}_i^B)$,
           通过目标评论家网络计算$Q_i^{\overline \psi }(o{'}_{1 \cdots N}^B,a{'}_{1 \cdots N}^B)$
     (13)     由式(25)计算联合损失函数$ {L_Q}(\psi ) $,并结合Adam来更新评论家网络
     (14)    end for
     (15)    for $ j = 1,2, \cdots ,{\text{num}} $ 演员网络 更新 do
     (16)     采取样本$m \times ({o_{1 \cdots N} }){\text{~}}D$
     (17)     计算$a_{1 \cdots N}^B{\text{~}}\pi _i^{\overline \theta }(o{'}_i^B),i \in 1,2, \cdots ,N$,$ Q_i^\psi (o_{1 \cdots N}^B,a_{1 \cdots N}^B) $
     (18)     由式(28)计算优势函数,再代入式(30)计算$ {\nabla _{{\theta _i}}}J({\pi _\theta }) $,并结合Adam来更新演员网络
     (19)    end for
     (20)    由式(27),更新目标评论家和演员网络参数:$ \overline \psi \leftarrow (1 - \tau )\overline \psi + \tau \psi $,$ \overline \theta \leftarrow (1 - \tau )\overline \theta + \tau \theta $
     (21)    ${T_{{\rm{update}}} } \leftarrow 0$
     (22)   end if
     (23)  end for
     (24) end for
    下载: 导出CSV
  • [1] CHAHBAR M, DIAZ G, DANDOUSH A, et al. A comprehensive survey on the E2E 5G network slicing model[J]. IEEE Transactions on Network and Service Management, 2021, 18(1): 49–62. doi: 10.1109/TNSM.2020.3044626
    [2] GONZALEZ A J, NENCIONI G, KAMISINSKI A, et al. Dependability of the NFV orchestrator: state of the art and research challenges[J]. IEEE Communications Surveys & Tutorials, 2018, 20(4): 3307–3329. doi: 10.1109/COMST.2018.2830648
    [3] SUN Gang, XU Zhu, YU Hongfang, et al. Low-latency and resource-efficient service function chaining orchestration in network function virtualization[J]. IEEE Internet of Things Journal, 2020, 7(7): 5760–5772. doi: 10.1109/JIOT.2019.2937110
    [4] LI Junling, SHI Weisen, YE Qiang, et al. Joint virtual network topology design and embedding for cybertwin-enabled 6G core networks[J]. IEEE Internet of Things Journal, 2021, 8(22): 16313–16325. doi: 10.1109/JIOT.2021.3097053
    [5] CHAI Rong, XIE Desheng, LUO Lei, et al. Multi-objective optimization-based virtual network embedding algorithm for software-defined networking[J]. IEEE Transactions on Network and Service Management, 2020, 17(1): 532–546. doi: 10.1109/TNSM.2019.2953297
    [6] CAO Haotong, DU Jianbo, ZHAO Haitao, et al. Resource-ability assisted service function chain embedding and scheduling for 6G networks with virtualization[J]. IEEE Transactions on Vehicular Technology, 2021, 70(4): 3846–3859. doi: 10.1109/TVT.2021.3065967
    [7] SOLOZABAL R, CEBERIO J, SANCHOYERTO A, et al. Virtual network function placement optimization with deep reinforcement learning[J]. IEEE Journal on Selected Areas in Communications, 2020, 38(2): 292–303. doi: 10.1109/JSAC.2019.2959183
    [8] CHEN Jing, CHEN Jia, and ZHANG Hongke. DRL-QOR: Deep reinforcement learning-based QoS/QoE-aware adaptive online orchestration in NFV-enabled networks[J]. IEEE Transactions on Network and Service Management, 2021, 18(2): 1758–1774. doi: 10.1109/TNSM.2021.3055494
    [9] HUANG Haojun, ZENG Cheng, ZHAO Yangmin, et al. Scalable orchestration of service function chains in NFV-enabled networks: A federated reinforcement learning approach[J]. IEEE Journal on Selected Areas in Communications, 2021, 39(8): 2558–2571. doi: 10.1109/JSAC.2021.3087227
    [10] GHARBAOUI M, CONTOLI C, DAVOLI G, et al. Demonstration of latency-aware and self-adaptive service chaining in 5G/SDN/NFV infrastructures[C]. 2018 IEEE Conference on Network Function Virtualization and Software Defined Networks (NFV-SDN), Verona, Italy, 2018: 1–2.
    [11] LIU Yu, SHANG Xiaojun, and YANG Yuanyuan. Joint SFC deployment and resource management in heterogeneous edge for latency minimization[J]. IEEE Transactions on Parallel and Distributed Systems, 2021, 32(8): 2131–2143. doi: 10.1109/TPDS.2021.3062341
    [12] YANG Jian, ZHANG Shuben, WU Xiaomin, et al. Online learning-based server provisioning for electricity cost reduction in data center[J]. IEEE Transactions on Control Systems Technology, 2017, 25(3): 1044–1051. doi: 10.1109/TCST.2016.2575801
    [13] PEI Jianing, HONG Peilin, XUE Kaiping, et al. Resource aware routing for service function chains in SDN and NFV-enabled network[J]. IEEE Transactions on Services Computing, 2021, 14(4): 985–997. doi: 10.1109/TSC.2018.2849712
    [14] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]. The 31st International Conference on Neural Information Processing Systems, Long Beach, USA, 2017: 6000–6010.
    [15] LI Han, LÜ Tiejun, and ZHANG Xuewei. Deep deterministic policy gradient based dynamic power control for self-powered ultra-dense networks[C]. 2018 IEEE Globecom Workshops (GC Wkshps), Abu Dhabi, United Arab Emirates, 2018: 1–6.
  • 加载中
图(8) / 表(1)
计量
  • 文章访问数:  458
  • HTML全文浏览量:  390
  • PDF下载量:  93
  • 被引次数: 0
出版历程
  • 收稿日期:  2022-06-17
  • 修回日期:  2022-10-13
  • 网络出版日期:  2022-12-23
  • 刊出日期:  2023-08-21

目录

    /

    返回文章
    返回