A Survey of Maintaining the Path Programmability in Software-Defined Wide Area Networks
-
摘要: 软件定义网络(SDN)被誉为下一代网络的关键技术。近年来,SDN已经成为学术界与工业界的热点。广域网是SDN应用到工业界的一个重要的场景。基于SDN的广域网被称为软件定义广域网(SD-WAN)。在SD-WAN中,SDN控制器通过控制流转发路径上的SDN交换机来实现流的路径可编程性。然而,控制器失效是SD-WAN中一种常见的现象。当控制器失效时,流转发路径上的交换机会失去控制,流的路径可编程性将无法得到保障,从而无法实现对网络流量的灵活调度,导致网络性能下降。该文对SD-WAN控制器失效场景下保证路径可编程性的研究工作进行了综述。该文首先阐述了当控制器失效时,SD-WAN中路径可编程性保障研究的背景及意义。随后,在查阅分析了国内外相关文献的基础上,介绍了当前在控制器失效时SD-WAN对交换机的主流控制方案。最后,对现有研究成果可能的进一步提高之处进行了总结,并对此研究的未来发展与研究前景进行了展望。Abstract: Software-Defined Networking (SDN) is the key technique of the next-generation network. Recently, SDN has become a hot spot in both academia and industry. Wide Area Network (WAN) is one of the primary application scenarios in the industry for SDN, which is known as Software-Defined WAN (SD-WAN). In SD-WAN, flexible traffic scheduling and network performance improvement are realized by the flow path programmability, which is enabled by the SDN controller to change dynamically the paths of flows traversing SDN switches. However, controller failure is a common phenomenon. When the controller fails, the switches controlled by the failed controller become offline, and the flows traversing the offline switches become offline too. In this way, the path programmability can not be guaranteed, and thus flexible flow control becomes invalid, leading to severe network performance degradation. This survey is presented to introduce the research works on maintaining path programmability in SD-WAN. First, the path programmability and the important feature for maintaining the path programmability in SD-WAN are introduced. Second, different types of existing solutions for coping with the controller failure in SD-WAN are proposed. Finally, potential improvements and future directions on this research topic are proposed.
-
表 1 保证路径可编程性的研究现状
恢复类型 恢复目标 恢复方法 优化目标 求解方法 参考文献 静态 降低失效概率 最优控制器放置 控制延迟 帕累托最优 [11] 控制器部署代价和路由代价 ILP [12] 所需控制器数量 ILP和启发式算法 [13] 控制延迟 MILP和模拟退火算法 [14] 控制延迟 启发式算法 [15] 节点重要程度 启发式算法 [16] 链路升级成本 ILP [17] 弹性控制结构设计 IP路由器更新数量 启发式算法 [18] 控制器视图异构度 启发式算法 [19] 控制器利用率 ILP和启发式算法 [20] 控制路径失效数量 ILP [21] 映射鲁棒性 ILP和启发式算法 [22] 降低失效后影响 主从控制器分配 负载变化 ILP和启发式算法 [23] 控制延迟、控制器负载均衡和映射鲁棒性 ILP和启发式算法 [24] 控制延迟 ILP和贪婪算法 [25] 控制器负载均衡 ILP和模拟退火算法 [26] 控制延迟 MILP和贪婪算法 [27] 失效检测 控制器恢复效果 基于区块链的启发式算法 [28] 应用服务质量 控制器负载迁移框架 [29] 故障恢复速度 高级消息队列协议 [30] 网络可靠性、电力成本和控制延迟 ILP、基于SVM的分类法和贪婪算法 [31] 重映射成本 ILP [32] 控制器负载均衡 基于控制器负载的贪婪算法 [33] 动态 维持控制弹性 交换机-控制器初始映射 控制器负载均衡 ILP和模拟退火算法 [34] 控制延迟、控制器负载均衡和映射鲁棒性 深度Q学习 [35] 所需控制器数量 LP [36] 提升恢复效果 交换机-控制器重映射 控制器负载均衡和控制器失效概率 MILP和遗传算法 [37] 所需控制器数量 LP和启发式算法 [38] 流建立时间 MILP [39] 控制器交换机信息交换时长 ILP [40] 负载变化和交换机迁移代价 MILP和启发式算法 [41] 控制器负载均衡和控制延迟 MILP和启发式算法 [42] 恢复流的数量 MILP和启发式算法 [43] 流-控制器重映射 可编程性均衡性、总体可编程和控制延迟 MILP和启发式算法 [44] 可编程性均衡性、总体可编程性 MILP和启发式算法 [45] -
[1] KREUTZ Diego, RAMOS M V F, VERÍSSIMO P E, et al. Software-defined networking: A comprehensive survey[J]. Proceedings of the IEEE, 2015, 103(1): 14–76. doi: 10.1109/JPROC.2014.2371999 [2] JAIN S, KUMAR A, MANDAL S, et al. B4: Experience with a globally-deployed software defined WAN[J]. ACM SIGCOMM Computer Communication Review, 2013, 43(4): 3–14. doi: 10.1145/2534169.2486019 [3] HONG Chiyao, KANDULA S, MAHAJAN R, et al. Achieving high utilization with software-driven WAN[C]. The ACM SIGCOMM 2013 Conference on SIGCOMM, Hong Kong, China, 2013: 15–26. [4] First in the U. S. to Mobile 5G – What’s next? Defining AT&T’s network path in 2019 and beyond[EB/OL]. https://about.att.com/story/2019/2019_and_beyond.html, 2019. [5] OpenFlow Switch Specification. Version 1.5. 1 (Protocol version 0x06)[EB/OL]. https://www.opennetworking.org/wp-content/uploads/2014/10/openflow-switch-v1.5.1.pdf, 2015. [6] LEVIN D, WUNDSAM A, HELLER B, et al. Logically centralized?: State distribution trade-offs in software defined networks[C]. The First Workshop on Hot Topics in Software Defined Networks, Helsinki, Finland, 2012: 1–6. [7] HELLER B, SHERWOOD R, and MCKEOWN N. The controller placement problem[J]. ACM SIGCOMM Computer Communication Review, 2012, 42(4): 473–478. doi: 10.1145/2377677.2377767 [8] ONOS controller[EB/OL]. https://onosproject.org/. [9] OpenDayLight controller [EB/OL]. https://www.opendaylight.org/. [10] ONGARO D and OUSTERHOUT J. In search of an understandable consensus algorithm[C]. The 2014 USENIX conference on USENIX Annual Technical Conference, Philadelphia, USA, 2014: 305–320. [11] HOCK D, HARTMANN M, GEBERT S, et al. Pareto-optimal resilient controller placement in SDN-based core networks[C]. The 2013 25th International Teletraffic Congress (ITC), Shanghai, China, 2013: 1–9. [12] TANHA M, SAJJADI Dawood, and PAN Jianping. Enduring node failures through resilient controller placement for software defined networks[C]. 2016 IEEE Global Communications Conference (GLOBECOM), Washington, USA, 2016: 1–7. [13] TANHA M, SAJJADI D, RUBY R, et al. Capacity-aware and delay-guaranteed resilient controller placement for software-defined WANs[J]. IEEE Transactions on Network and Service Management, 2018, 15(3): 991–1005. doi: 10.1109/TNSM.2018.2829661 [14] KILLI B P R and RAO S V. Capacitated next controller placement in software defined networks[J]. IEEE Transactions on Network and Service Management, 2017, 14(3): 514–527. doi: 10.1109/TNSM.2017.2720699 [15] ALSHAMRANI A, GUHA S, PISHARODY S, et al. Fault tolerant controller placement in distributed SDN environments[C]. 2018 IEEE International Conference on Communications (ICC), Kansas City, USA, 2018: 1–7. [16] ALENAZI M J F and ÇETINKAYA E K. Resilient placement of SDN controllers exploiting disjoint paths[J]. Transactions on Emerging Telecommunications Technologies, 2020, 31(2): e3725. doi: 10.1002/ett.3725 [17] SANTOS D, GOMES T, and TIPPER D. SDN controller placement with availability upgrade under delay and geodiversity constraints[J]. IEEE Transactions on Network and Service Management, 2021, 18(1): 301–314. doi: 10.1109/TNSM.2020.3049013 [18] YANG Ze and YEUNG K L. SDN candidate selection in hybrid IP/SDN networks for single link failure protection[J]. IEEE/ACM Transactions on Networking, 2020, 28(1): 312–321. doi: 10.1109/TNET.2019.2959588 [19] 高洁, 邬江兴, 胡宇翔, 等. 基于拜占庭容错的软件定义网络控制面的抗攻击性研究[J]. 计算机应用, 2017, 37(8): 2281–2286. doi: 10.11772/j.issn.1001-9081.2017.08.2281GAO Jie, WU Jiangxing, HU Yuxiang, et al. Research of control plane’ anti-attacking in software-defined network based on Byzantine fault-tolerance[J]. Journal of Computer Applications, 2017, 37(8): 2281–2286. doi: 10.11772/j.issn.1001-9081.2017.08.2281 [20] XIE Junjie, GUO Deke, QIAN Chen, et al. Validation of distributed SDN control plane under uncertain failures[J]. IEEE/ACM Transactions on Networking, 2019, 27(3): 1234–1247. doi: 10.1109/TNET.2019.2914122 [21] HU Yannan, WANG Wendong, GONG Xiangyang, et al. On reliability-optimized controller placement for software-defined networks[J]. China Communications, 2014, 11(2): 38–54. doi: 10.1109/CC.2014.6821736 [22] ZHANG Lingyu, WANG Ying, LI Wenjing, et al. A survivability-based backup approach for controllers in multi-controller SDN against failures[C]. 2017 19th Asia-Pacific Network Operations and Management Symposium (APNOMS), Seoul, Korea (South), 2017: 100–105. [23] HU Tao, GUO Zehua, ZHANG Jianhui, et al. Adaptive slave controller assignment for fault-tolerant control plane in software-defined networking[C]. 2018 IEEE International Conference on Communications (ICC), Kansas City, USA, 2018: 1–6. [24] HU Tao, YI Peng, GUO Zehua, et al. Dynamic slave controller assignment for enhancing control plane robustness in software-defined networks[J]. Future Generation Computer Systems, 2019, 95: 681–693. doi: 10.1016/j.future.2019.01.010 [25] HE Fujun, SATO T, and OKI E. Master and slave controller assignment model against multiple failures in software defined network[C]. ICC 2019 - 2019 IEEE International Conference on Communications (ICC), Shanghai, China, 2019: 1–6. [26] HE Fujun and OKI E. Load balancing model against multiple controller failures in software defined networks[C]. ICC 2020 - 2020 IEEE International Conference on Communications (ICC), Dublin, Ireland, 2020: 1–6. [27] HE Fujun and OKI E. Main and secondary controller assignment with optimal priority policy against multiple failures[J]. IEEE Transactions on Network and Service Management, 2021, 18(4): 4391–4405. doi: 10.1109/TNSM.2021.3064646 [28] MISRA S, SARKAR K, and AHMED N. Blockchain-based controller recovery in SDN[C]. IEEE INFOCOM 2020 – IEEE IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS), Toronto, Canada, 2020: 1063–1068. [29] BASU K, HAMDULLAH A, and BALL F. Architecture of a cloud-based fault-tolerant control platform for improving the QoS of social multimedia applications on SD-WAN[C]. 2020 13th International Conference on Communications (COMM), Bucharest, Romania, 2020: 495–500. [30] 乐宗港, 黄刘生, 徐宏力. 基于AMQP的SDN控制器故障恢复机制[J]. 通信技术, 2017, 50(3): 487–491. doi: 10.3969/j.issn.1002-0802.2017.03.018LE Zonggang, HUANG Liusheng, and XU Hongli. Failure recovery mechanism of SDN controller based on AMQP[J]. Communications Technology, 2017, 50(3): 487–491. doi: 10.3969/j.issn.1002-0802.2017.03.018 [31] REN Xiaodon, AUJLA S G, JINDAL A, et al. Adaptive recovery mechanism for SDN controllers in Edge-Cloud supported FinTech applications[J]. IEEE Internet of Things Journal, 2023, 10(3): 2112–2120. doi: 10.1109/JIOT.2021.3064468 [32] GUILLEN L, IZUMI S, ABE T, et al. A resilient mechanism for multi-controller failure in hybrid SDN-based networks[C]. 2021 22nd Asia-Pacific Network Operations and Management Symposium (APNOMS), Tainan, China, 2021: 285–290. [33] DHARAM P and DEY M. A mechanism for controller failover in distributed software-defined networks[C]. 2021 8th International Conference on Computer and Communication Engineering (ICCCE), Kuala Lumpur, Malaysia, 2021: 196–201. [34] AÇAN F, GÜR G, and ALAGÖZ F. Reactive controller assignment for failure resilience in software defined networks[C]. 2019 20th Asia-Pacific Network Operations and Management Symposium (APNOMS), Matsue, Japan, 2019: 1–6. [35] CHEN Jia, CHEN Shihua, CHENG Xin, et al. A deep reinforcement learning based switch controller mapping strategy in software defined network[J]. IEEE Access, 2020, 8: 221553–221567. doi: 10.1109/ACCESS.2020.3043511 [36] MOHAN P M, TRUONG-HUU T, and GURUSAMY M. Primary-backup controller mapping for Byzantine fault tolerance in software defined networks[C]. GLOBECOM 2017 - 2017 IEEE Global Communications Conference, Singapore, 2017: 1–7. [37] GÜNER S, GÜR G, and ALAGÖZ F. Proactive controller assignment schemes in SDN for fast recovery[C]. 2020 International Conference on Information Networking (ICOIN), Barcelona, Spain, 2020: 136–141. [38] MOHAN P M, TRUONG-HUU T, and GURUSAMY M. Byzantine-resilient controller mapping and remapping in software defined networks[J]. IEEE Transactions on Network Science and Engineering, 2020, 7(4): 2714–2729. doi: 10.1109/TNSE.2020.2981521 [39] SRIDHARAN V, GURUSAMY M, and TRUONG-HUU T. On multiple controller mapping in software defined networks with resilience constraints[J]. IEEE Communications Letters, 2017, 21(8): 1763–1766. doi: 10.1109/LCOMM.2017.2696006 [40] SRIDHARAN V, LIYANAGE K S K, and GURUSAMY M. Privacy-aware switch-controller mapping in SDN-based IoT networks[C]. 2020 International Conference on Communication Systems & NETworkS (COMSNETS), Bengaluru, India, 2020: 1–6. [41] AL-TAM F and CORREIA N. On load balancing via switch migration in software-defined networking[J]. IEEE Access, 2019, 7: 95998–96010. doi: 10.1109/ACCESS.2019.2929651 [42] AL-TAM F and CORREIA N. Fractional switch migration in multi-controller software-defined networking[J]. Computer Networks, 2019, 157: 1–10. doi: 10.1016/j.comnet.2019.04.011 [43] DOU Songshi, MIAO Guochun, GUO Zehua, et al. Matchmaker: Maintaining network programmability for Software-Defined WANs under multiple controller failures[J]. Computer Networks, 2021, 192: 108045. doi: 10.1016/j.comnet.2021.108045 [44] GUO Zehua, DOU Songshi, and JIANG Wenchao. Improving the path programmability for software-defined wans under multiple controller failures[C]. 2020 IEEE/ACM 28th International Symposium on Quality of Service (IWQoS), Hangzhou, China, 2020: 1–10. [45] DOU Songshi, GUO Zehua, and XIA Yuanqing. ProgrammabilityMedic: Predictable path programmability recovery under multiple controller failures in SD-WANs[C]. 2021 IEEE 41st International Conference on Distributed Computing Systems (ICDCS), Washington DC, USA, 2021: 461–471. [46] VAN ADRICHEM N L M, DOERR C, and KUIPERS F A. Opennetmon: Network monitoring in openflow software-defined networks[C]. 2014 IEEE Network Operations and Management Symposium (NOMS), Krakow, Poland, 2014: 1–8. [47] TOOTOONCHIAN A, GHOBADI M, and GANJALI Y. OpenTM: Traffic matrix estimator for OpenFlow networks[C]. 11th International Conference on Passive and Active Network Measurement, Zurich, Switzerland, 2010: 201–210. [48] XIE Junjie, GUO Deke, LI Xiaozhou, et al. Cutting long-tail latency of routing response in software defined networks[J]. IEEE Journal on Selected Areas in Communications, 2018, 36(3): 384–396. doi: 10.1109/JSAC.2018.2815358 [49] YAO Guang, BI Jun, and GUO Luyi. On the cascading failures of multi-controllers in software defined networks[C]. 2013 21st IEEE International Conference on Network Protocols (ICNP), Goettingen, Germany, 2013: 1–2. [50] SHERWOOD R, GIBB G, YAP K K, et al. . Flowvisor: A network virtualization layer[R]. OpenFlow Switch Consortium, Tech. Rep, 2009, 1: 132. [51] BERA S, MISRA S, and SAHA N. Traffic-aware dynamic controller assignment in SDN[J]. IEEE Transactions on Communications, 2020, 68(7): 4375–4382. doi: 10.1109/TCOMM.2020.2983168 [52] YANG Xuwei, XU Hongli, CHEN Shigang, et al. Indirect multi-mapping for burstiness management in software defined networks[J]. IEEE/ACM Transactions on Networking, 2021, 29(5): 2059–2072. doi: 10.1109/TNET.2021.3078132 [53] Brocade MLX-8 Pe[EB/OL]. [2022–03-29]. https://www.dataswitchworks.com/datasheets/MLX_Series_DS.pdf. [54] CHN-IX[EB/OL]. [2022–03-29]. http://www.chn-ix.net/. [55] XU Hongli, HUANG He, CHEN Shigang, et al. Achieving high scalability through hybrid switching in software-defined networking[J]. IEEE/ACM Transactions on Networking, 2018, 26(1): 618–632. doi: 10.1109/TNET.2018.2789339