基于强化学习的立体全景视频自适应流

兰诚栋; 饶迎节; 宋彩霞; 陈建

doi:10.11999/JEIT200908

基于强化学习的立体全景视频自适应流

doi: 10.11999/JEIT200908 cstr: 32379.14.JEIT200908

兰诚栋^{1, 2},
饶迎节^{1, 2},
宋彩霞^{1, 2},
陈建^1, ,

1.
福州大学物理与信息工程学院福州 350108
2.
福建省媒体信息智能处理与无线传输重点实验室福州 350108

基金项目: 国家自然科学基金(62001117)，福建省自然科学基金(2017J01757)

详细信息

作者简介:
兰诚栋：男，1981年生，副教授，研究方向为视频编码与处理、人工智能、多媒体网络传输

饶迎节：男，1994年生，硕士生，研究方向为多媒体网络传输、全景视频编解码、机器学习

宋彩霞：女，1996年生，硕士生，研究方向为图像重建、全景视频编解码、深度学习

陈建：女，1981年生，副教授，研究方向为视频编码与处理

通讯作者:
陈建　chenjian-fzu@163.com

中图分类号: TN919
计量
- 文章访问数: 1045
- HTML全文浏览量: 626
- PDF下载量: 105
- 被引次数: 0
出版历程
- 收稿日期: 2020-10-23
- 修回日期: 2022-01-05
- 录用日期: 2022-01-14
- 网络出版日期: 2022-02-02
- 刊出日期: 2022-04-18

Adaptive Streaming of Stereoscopic Panoramic Video Based on Reinforcement Learning

LAN Chengdong^{1, 2},
RAO Yingjie^{1, 2},
SONG Caixia^{1, 2},
CHEN Jian^{1
, ,}

1.
College of Physics and Information Engineering, Fuzhou University, Fuzhou 350108, China
2.
Fujian Provincial Key Laboratory of Media Information Intelligent Processing and Wireless Transmission, Fuzhou 350108, China

Funds: The National Natural Science Foundation of China (62001117), Fujian Province Natural Science Foundation (2017J01757)

摘要

摘要: 针对当前立体全景视频传输缺少有效的流自适应方法，且传统全景视频流自适应策略传输双目立体全景视频使得传输数据加倍，所需带宽巨大的问题，该文提出一种基于多智能体强化学习的立体全景视频非对称传输自适应流方法，以实时应对网络带宽波动。首先，根据人眼对视频显著性区域的偏爱，左右视点中每个瓦片(tile)对立体视频的感知质量的贡献度不同，提出一个基于tiles的左右视点观看概率预测方法。其次，设计了一种基于策略-评价(Actor-Critic)的多智能体强化学习框架，对左右视点进行联合码率控制。最后，根据模型结构和双目抑制原理，设计合理的奖励函数。实验结果表明，与传统流自适应传输策略相比，该文所提方法更加适用于基于tiles的立体全景视频传输，实现在有限带宽下提高用户的体验质量(QoE)，为立体全景视频联合码率控制提供了一种全新的方法和思路。
- 立体全景视频传输 /
- 多智能体强化学习 /
- 视点预测 /
- 联合码率控制
Abstract: Currently, an effective stream adaptation method for stereo panoramic video transmission is missing. However, the traditional panoramic video adaptive streaming strategy for transmitting binocular stereo panoramic video suffers from the problem of doubling the transmission data and requiring huge bandwidth. A multi-agent reinforcement learning based stereo panoramic video asymmetric transmission adaptive streaming method is proposed in this paper to cope with the limited bandwidth and fluctuation of network bandwidth in real time. First, due to the human eye's preference for the saliency regions of video, each tile in the left and right viewpoints of stereoscopic video contributes differently to the perceptual quality, and a tiles-based method for predicting the watching probability of left and right viewpoint is proposed. Second, a multi-agent reinforcement learning framework based on policy-value (Actor-Critic) is designed for joint rate control of left and right viewpoints. Finally, a reasonable reward function is designed based on the model structure and the principle of binocular suppression. The experimental results show that the proposed method is more suitable for tiles-based stereo panoramic video transmission than the traditional self-adaptive stream transmission strategy. A novel approach is proposed for stereo panoramic video joint rate control and user Quality of Experience (QoE) improvement under limited bandwidth.
- Stereo panoramic video transmission /
- Multi-agent reinforcement learning /
- Viewpoint prediction /
- Joint rate control

HTML全文

图 1 基于DASH的立体全景视频流系统结构图

下载: 全尺寸图片幻灯片

图 2 基于tile的视点预测概率模型

下载: 全尺寸图片幻灯片

图 3 算法结构图

下载: 全尺寸图片幻灯片

图 4 4G和5G带宽轨迹

下载: 全尺寸图片幻灯片

图 5 各算法性能比较

下载: 全尺寸图片幻灯片

图 6 各算法CDF比较

下载: 全尺寸图片幻灯片

表 1 时间测试与视点预测精度

方法静态
显著性提取动态
显著性提取视差提取总共时间预测精度

Plato – – – 67.4 ms 0.89
本文 4.2 ms 10.3 ms 23.7 ms 121.6 ms 0.91

下载: 导出CSV

参考文献(22)

[1]	高媛, 刘德建, 黄真真, 等. 虚拟现实技术促进学习的核心要素及其挑战[J]. 电化教育研究, 2016, 37(10): 77–87,103. GAO Yuan, LIU Dejian, HUANG Zhenzhen, et al. The core factors and challenges of virtual reality technology enhanced learning[J]. e-Education Research, 2016, 37(10): 77–87,103.
[2]	CISCO. Cisco visual networking index: Global mobile data traffic forecast update, 2017-2022[EB/OL]. https://s3.amazonaws.com/media.mediapost.com/uploads/CiscoForecast.pdf, 2019.
[3]	HUANG Jingwei, CHEN Zhili, CEYLAN D, et al. 6-DOF VR videos with a single 360-camera[C]. 2017 IEEE Virtual Reality, Los Angeles, USA, 2017: 37–44.
[4]	JIANG Xiaolan, CHIANG Yihan, ZHAO Yang, et al. Plato: Learning-based adaptive streaming of 360-Degree videos[C]. 2018 IEEE 43rd Conference on Local Computer Networks, Chicago, USA, 2018: 393–400.
[5]	KAN Nuowen, ZOU Junni, TANG Kexin, et al. Deep reinforcement learning-based rate adaptation for adaptive 360-Degree video streaming[C]. IEEE International Conference on Acoustics, Speech and Signal Processing, Brighton, UK, 2019: 4030–4034.
[6]	NAIK D, CURCIO I D D, and TOUKOMAA H. Optimized viewport dependent streaming of stereoscopic omnidirectional video[C]. The 23rd Packet Video Workshop, Amsterdam, Netherlands, 2018: 37–42.
[7]	CURCIO I D D, NAIK D, TOUKOMAA H, et al. Subjective quality of spatially asymmetric omnidirectional stereoscopic video for streaming adaptation[C]. First International Conference on Smart Multimedia, Toulon, France, 2018: 417–428.
[8]	CURCIO I D D, TOUKOMAA H, and NAIK D. Bandwidth reduction of omnidirectional viewport-dependent video streaming via subjective quality assessment[C]. The 2nd International Workshop on Multimedia Alternate Realities, Mountain View, USA, 2017: 9–14.
[9]	XU Guisen, WANG Yueming, WANG Zhenyu, et al. Asymmetric representation for 3D panoramic video[C]. 18th Pacific-Rim Conference on Multimedia, Harbin, China, 2018: 683–690.
[10]	CHANG Yongjun and KIM M. Binocular suppression-based stereoscopic video coding by joint rate control with KKT conditions for a hybrid video codec system[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2015, 25(1): 99–111. doi: 10.1109/TCSVT.2014.2330658
[11]	杨福星, 孙博文, 夏进. 基于DASH的全景视频传输应用研究[J]. 无线互联科技, 2018, 15(3): 25–28. doi: 10.3969/j.issn.1672-6944.2018.03.010 YANG Fuxing, SUN Bowen and XIA Jin. Study on the panoramic video transmission based on DASH[J]. Wireless Internet Technology, 2018, 15(3): 25–28. doi: 10.3969/j.issn.1672-6944.2018.03.010
[12]	KÖPÜKLÜ O, KOSE N, GUNDUZ A, et al. Resource efficient 3d convolutional neural networks[C]. IEEE/CVF International Conference on Computer Vision Workshop, Seoul, Korea (South), 2019: 1910–1919.
[13]	LAGOUDAKIS M G and PARR R. Least-squares policy iteration[J]. Journal of Machine Learning Research, 2003, 4: 1107–1149.
[14]	BAN Yixuan, XIE Lan, XU Zhimin, et al. An optimal spatial-temporal smoothness approach for tile-based 360-Degree video streaming[C]. 2017 IEEE Visual Communications and Image Processing, St. Petersburg, USA, 2017: 1–4.
[15]	BATTISTI F, CARLI M, LE CALLET P, et al. Toward the assessment of quality of experience for asymmetric encoding in immersive media[J]. IEEE Transactions on Broadcasting, 2018, 64(2): 392–406. doi: 10.1109/TBC.2018.2828607
[16]	https://github.com/rao567/3dvideo.
[17]	CORBILLON X, DE SIMONE F, and SIMON G. 360-Degree video head movement dataset[C]. The 8th ACM on Multimedia Systems Conference, Taipei, China, 2017: 199–204.
[18]	VAN DER HOOFT J, PETRANGELI S, WAUTERS T, et al. HTTP/2-based adaptive streaming of HEVC video over 4G/LTE networks[J]. IEEE Communications Letters, 2016, 20(11): 2177–2180.
[19]	RACA D, LEAHY D, SREENAN C J, et al. Beyond throughput, the next generation: A 5G dataset with channel and context metrics[C]. The 11th ACM Multimedia Systems Conference, Istanbul, Turkey, 2020: 303–308.
[20]	YOUTUBE, Recommended upload encoding settings[EB/OL].https://yongqiang.blog.csdn.net/article/details/103602709, 2019.
[21]	NGUYEN D V, TRAN H T T, PHAM A T, et al. An optimal tile-based approach for viewport-adaptive 360-Degree video streaming[J]. IEEE Journal on Emerging and Selected Topics in Circuits and Systems, 2019, 9(1): 29–42. doi: 10.1109/JETCAS.2019.2899488
[22]	SAYGILI G, GURLER C G, and TEKALP A M. Evaluation of asymmetric stereo video coding and rate scaling for adaptive 3D video streaming[J]. IEEE Transactions on Broadcasting, 2011, 57(2): 593–601. doi: 10.1109/TBC.2011.2131450