Adaptive Streaming of Stereoscopic Panoramic Video Based on Reinforcement Learning
-
摘要: 针对当前立体全景视频传输缺少有效的流自适应方法,且传统全景视频流自适应策略传输双目立体全景视频使得传输数据加倍,所需带宽巨大的问题,该文提出一种基于多智能体强化学习的立体全景视频非对称传输自适应流方法,以实时应对网络带宽波动。首先,根据人眼对视频显著性区域的偏爱,左右视点中每个瓦片(tile)对立体视频的感知质量的贡献度不同,提出一个基于tiles的左右视点观看概率预测方法。其次,设计了一种基于策略-评价(Actor-Critic)的多智能体强化学习框架,对左右视点进行联合码率控制。最后,根据模型结构和双目抑制原理,设计合理的奖励函数。实验结果表明,与传统流自适应传输策略相比,该文所提方法更加适用于基于tiles的立体全景视频传输,实现在有限带宽下提高用户的体验质量(QoE),为立体全景视频联合码率控制提供了一种全新的方法和思路。Abstract: Currently, an effective stream adaptation method for stereo panoramic video transmission is missing. However, the traditional panoramic video adaptive streaming strategy for transmitting binocular stereo panoramic video suffers from the problem of doubling the transmission data and requiring huge bandwidth. A multi-agent reinforcement learning based stereo panoramic video asymmetric transmission adaptive streaming method is proposed in this paper to cope with the limited bandwidth and fluctuation of network bandwidth in real time. First, due to the human eye's preference for the saliency regions of video, each tile in the left and right viewpoints of stereoscopic video contributes differently to the perceptual quality, and a tiles-based method for predicting the watching probability of left and right viewpoint is proposed. Second, a multi-agent reinforcement learning framework based on policy-value (Actor-Critic) is designed for joint rate control of left and right viewpoints. Finally, a reasonable reward function is designed based on the model structure and the principle of binocular suppression. The experimental results show that the proposed method is more suitable for tiles-based stereo panoramic video transmission than the traditional self-adaptive stream transmission strategy. A novel approach is proposed for stereo panoramic video joint rate control and user Quality of Experience (QoE) improvement under limited bandwidth.
-
表 1 时间测试与视点预测精度
方法 静态
显著性提取动态
显著性提取视差提取 总共时间 预测精度 Plato – – – 67.4 ms 0.89 本文 4.2 ms 10.3 ms 23.7 ms 121.6 ms 0.91 -
[1] 高媛, 刘德建, 黄真真, 等. 虚拟现实技术促进学习的核心要素及其挑战[J]. 电化教育研究, 2016, 37(10): 77–87,103.GAO Yuan, LIU Dejian, HUANG Zhenzhen, et al. The core factors and challenges of virtual reality technology enhanced learning[J]. e-Education Research, 2016, 37(10): 77–87,103. [2] CISCO. Cisco visual networking index: Global mobile data traffic forecast update, 2017-2022[EB/OL]. https://s3.amazonaws.com/media.mediapost.com/uploads/CiscoForecast.pdf, 2019. [3] HUANG Jingwei, CHEN Zhili, CEYLAN D, et al. 6-DOF VR videos with a single 360-camera[C]. 2017 IEEE Virtual Reality, Los Angeles, USA, 2017: 37–44. [4] JIANG Xiaolan, CHIANG Yihan, ZHAO Yang, et al. Plato: Learning-based adaptive streaming of 360-Degree videos[C]. 2018 IEEE 43rd Conference on Local Computer Networks, Chicago, USA, 2018: 393–400. [5] KAN Nuowen, ZOU Junni, TANG Kexin, et al. Deep reinforcement learning-based rate adaptation for adaptive 360-Degree video streaming[C]. IEEE International Conference on Acoustics, Speech and Signal Processing, Brighton, UK, 2019: 4030–4034. [6] NAIK D, CURCIO I D D, and TOUKOMAA H. Optimized viewport dependent streaming of stereoscopic omnidirectional video[C]. The 23rd Packet Video Workshop, Amsterdam, Netherlands, 2018: 37–42. [7] CURCIO I D D, NAIK D, TOUKOMAA H, et al. Subjective quality of spatially asymmetric omnidirectional stereoscopic video for streaming adaptation[C]. First International Conference on Smart Multimedia, Toulon, France, 2018: 417–428. [8] CURCIO I D D, TOUKOMAA H, and NAIK D. Bandwidth reduction of omnidirectional viewport-dependent video streaming via subjective quality assessment[C]. The 2nd International Workshop on Multimedia Alternate Realities, Mountain View, USA, 2017: 9–14. [9] XU Guisen, WANG Yueming, WANG Zhenyu, et al. Asymmetric representation for 3D panoramic video[C]. 18th Pacific-Rim Conference on Multimedia, Harbin, China, 2018: 683–690. [10] CHANG Yongjun and KIM M. Binocular suppression-based stereoscopic video coding by joint rate control with KKT conditions for a hybrid video codec system[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2015, 25(1): 99–111. doi: 10.1109/TCSVT.2014.2330658 [11] 杨福星, 孙博文, 夏进. 基于DASH的全景视频传输应用研究[J]. 无线互联科技, 2018, 15(3): 25–28. doi: 10.3969/j.issn.1672-6944.2018.03.010YANG Fuxing, SUN Bowen and XIA Jin. Study on the panoramic video transmission based on DASH[J]. Wireless Internet Technology, 2018, 15(3): 25–28. doi: 10.3969/j.issn.1672-6944.2018.03.010 [12] KÖPÜKLÜ O, KOSE N, GUNDUZ A, et al. Resource efficient 3d convolutional neural networks[C]. IEEE/CVF International Conference on Computer Vision Workshop, Seoul, Korea (South), 2019: 1910–1919. [13] LAGOUDAKIS M G and PARR R. Least-squares policy iteration[J]. Journal of Machine Learning Research, 2003, 4: 1107–1149. [14] BAN Yixuan, XIE Lan, XU Zhimin, et al. An optimal spatial-temporal smoothness approach for tile-based 360-Degree video streaming[C]. 2017 IEEE Visual Communications and Image Processing, St. Petersburg, USA, 2017: 1–4. [15] BATTISTI F, CARLI M, LE CALLET P, et al. Toward the assessment of quality of experience for asymmetric encoding in immersive media[J]. IEEE Transactions on Broadcasting, 2018, 64(2): 392–406. doi: 10.1109/TBC.2018.2828607 [16] https://github.com/rao567/3dvideo. [17] CORBILLON X, DE SIMONE F, and SIMON G. 360-Degree video head movement dataset[C]. The 8th ACM on Multimedia Systems Conference, Taipei, China, 2017: 199–204. [18] VAN DER HOOFT J, PETRANGELI S, WAUTERS T, et al. HTTP/2-based adaptive streaming of HEVC video over 4G/LTE networks[J]. IEEE Communications Letters, 2016, 20(11): 2177–2180. [19] RACA D, LEAHY D, SREENAN C J, et al. Beyond throughput, the next generation: A 5G dataset with channel and context metrics[C]. The 11th ACM Multimedia Systems Conference, Istanbul, Turkey, 2020: 303–308. [20] YOUTUBE, Recommended upload encoding settings[EB/OL].https://yongqiang.blog.csdn.net/article/details/103602709, 2019. [21] NGUYEN D V, TRAN H T T, PHAM A T, et al. An optimal tile-based approach for viewport-adaptive 360-Degree video streaming[J]. IEEE Journal on Emerging and Selected Topics in Circuits and Systems, 2019, 9(1): 29–42. doi: 10.1109/JETCAS.2019.2899488 [22] SAYGILI G, GURLER C G, and TEKALP A M. Evaluation of asymmetric stereo video coding and rate scaling for adaptive 3D video streaming[J]. IEEE Transactions on Broadcasting, 2011, 57(2): 593–601. doi: 10.1109/TBC.2011.2131450