高级搜索

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

图表示学习驱动的点云视频流自适应传输方案

刘威 陈锐阳 王希 张嘉伟 徐晶

刘威, 陈锐阳, 王希, 张嘉伟, 徐晶. 图表示学习驱动的点云视频流自适应传输方案[J]. 电子与信息学报. doi: 10.11999/JEIT251084
引用本文: 刘威, 陈锐阳, 王希, 张嘉伟, 徐晶. 图表示学习驱动的点云视频流自适应传输方案[J]. 电子与信息学报. doi: 10.11999/JEIT251084

图表示学习驱动的点云视频流自适应传输方案

doi: 10.11999/JEIT251084 cstr: 32379.14.JEIT251084
基金项目: 基金1,基金2,基金3
详细信息
    作者简介:

    刘威:男,博士,教授,主要研究方向为多媒体网络、物联网、智能感知

    陈锐阳:男,硕士生,主要研究方向为为多媒体网络、沉浸式媒体

    王希:女,博士生,主要研究方向为为多媒体网络、沉浸式视频

    张嘉伟:男,硕士生,主要研究方向为为多媒体网络、点云视频

    徐晶:男,博士,副教授,主要研究方向为低功耗物联网、智能感知、无人系统

    通讯作者:

    王希 twx@hust.edu.cn

  • 中图分类号: TN919

Funds: Item1, Item2, Item3
  • 摘要: 针对点云视频流在带宽受限网络下面临的用户体验质量 (QoE) 保障难题,本文提出了一种融合视区预测与动态质量分配的QoE优化框架。为提升预测精度,设计了一种基于图表示学习的视区预测方案,通过显式建模用户在三维场景中的空间上下文与移动模式,并将学习到的空间先验知识与用户历史轨迹相融合,以提升六自由度 (6DoF) 视区预测的长期准确性。为实现智能分配,本文提出一种基于上下文赌博机的动态质量分配方案。该方案根据实时上下文信息,在带宽约束下为各空间切块自适应地分配质量等级,旨在提升长期累积QoE,保障用户体验。在公开数据集上的仿真实验结果表明,本文方案在视区预测精度和综合QoE上均显著优于多种基线方案,展现了优异的适应性与稳定性。
  • 图  1  用户观看位置的热力分布图

    图  2  基于图表示学习的视区预测方案

    图  3  不同方案的视点预测指标随时间变化对比

    图  4  不同方案的视区预测平均性能对比

    图  5  不同方案在单片段观看中的性能指标比较

    图  6  不同方案在波动带宽下的性能指标比较

    图  7  不同质量等级动作的置信区间收敛过程

    1  上下文感知的动态质量分配算法

     输入:预测视区$ {\hat{F}}_{t} $, 上下文列表$ {\{x}_{t,k}\}_{k=1}^{K} $, 带宽$ {B}_{t} $
     输出:切块质量分配列表$ \{{a}_{t,k}\}_{k=1}^{K} $
     1: 划分切块: $ {\mathcal{P}}_{out},{\mathcal{P}}_{pred} $
     2: $ {b}_{in}\leftarrow g_{1}^{t}\cdot {B}_{t} $,$ {b}_{out}\leftarrow {B}_{t}-{b}_{in} $
     3: Function Allocate$ (\mathcal{K},{b}_{budget}) $
     4:  按 $ {u}_{t,k,maxlevel} $对切块列表$ \mathcal{K} $进行降序排序
     5:  for 所有切块$ {c}_{t,k}\in \mathcal{K} $ do
     6:   $ {a}_{t,k}\leftarrow max\{a\mid size({c}_{t,k},a)\leq {b}_{budget}\} $
     7:   $ {b}_{budget}\leftarrow {b}_{budget}-\text{size}({c}_{t,k},{a}_{t,k}) $
     8:   使用($ {x}_{t,k},{a}_{t,k},rewar{d}_{t,k} $)更新LinUCB
     9:  end for
     10: end Function
     11: $ \text{Allocate}(\{k\mid k\in {\hat{F}}_{t}\},{b}_{in}) $
     12: $ \text{Allocate}(\{k\mid k\notin {\hat{F}}_{t}\},{b}_{out}) $
     13: return $ \{{a}_{t,k}\}_{k=1}^{K} $
    下载: 导出CSV

    表  1  实验参数设置

    参数
    输入/输出窗口(W/$ \tau $)20/10frames
    LSTM隐藏层结构(128,64)
    最大质量等级$ maxlevel $5
    LinUCB探索参数$ \lambda $0.15
    QoE权重因子$ \gamma $(0.3,1,1)
    下载: 导出CSV
  • [1] 王旭, 刘琼, 彭宗举, 等. 6DoF视频技术研究进展[J]. 中国图象图形学报, 2023, 28(6): 1863–1890. doi: 10.11834/jig.230025.

    WANG Xu, LIU Qiong, PENG Zongju, et al. Research progress of six degree of freedom (6DoF) video technology[J]. Journal of Image and Graphics, 2023, 28(6): 1863–1890. doi: 10.11834/jig.230025.
    [2] LIU Zhi, LI Qiyue, CHEN Xianfu, et al. Point cloud video streaming: challenges and solutions[J]. IEEE Network, 2021, 35(5): 202–209. doi: 10.1109/MNET.101.2000364.
    [3] 陈晓雷, 王兴, 张学功, 等. 面向360度全景图像显著目标检测的相邻协调网络[J]. 电子与信息学报, 2024, 46(12): 4529–4541. doi: 10.11999/JEIT240502.

    CHEN Xiaolei, WANG Xing, ZHANG Xuegong, et al. Adjacent coordination network for salient object detection in 360 degree omnidirectional images[J]. Journal of Electronics & Information Technology, 2024, 46(12): 4529–4541. doi: 10.11999/JEIT240502.
    [4] BENTALEB A, LIM M, HAMMOUDI S, et al. Solutions, challenges, and opportunities in volumetric video streaming: An architectural perspective[J]. ACM Transactions on Multimedia Computing, Communications, and Applications, 2025, 21(7): 187. doi: 10.1145/3705321.
    [5] D'EON E, HARRISON B, MYERS T, et al. 8i voxelized full bodies - a voxelized point cloud dataset[R]. ISO/IEC JTC1/SC29 Joint WG11/WG1 (MPEG/JPEG) input document WG11M40059/WG1M74006, 2017.
    [6] VAN DER HOOFT J, VEGA M T, WAUTERS T, et al. From capturing to rendering: Volumetric media delivery with six degrees of freedom[J]. IEEE Communications Magazine, 2020, 58(10): 49–55. doi: 10.1109/MCOM.001.2000242.
    [7] HU Qiang, ZHONG Houqiang, ZHENG Zihan, et al. VRVVC: Variable-rate NeRF-based volumetric video compression[C]. Proceedings of the Thirty-Ninth AAAI Conference on Artificial Intelligence, Philadelphia, USA, 2025: 3563–3571. doi: 10.1609/aaai.v39i4.32370.
    [8] 曾焕强, 孔庆玮, 陈婧, 等. 沉浸式视频编码技术综述[J]. 电子与信息学报, 2024, 46(2): 602–614. doi: 10.11999/JEIT230097.

    ZENG Huanqiang, KONG Qingwei, CHEN Jing, et al. Overview of immersive video coding[J]. Journal of Electronics & Information Technology, 2024, 46(2): 602–614. doi: 10.11999/JEIT230097.
    [9] 朱原玮, 黄亚坤, 乔秀全. 面向全息视频通信的自适应分块传输方法[J]. 电子学报, 2024, 52(4): 1144–1154. doi: 10.12263/DZXB.20230788.

    ZHU Yuanwei, HUANG Yakun, and QIAO Xiuquan. Towards holographic video communications: An adaptive tiling solution[J]. Acta Electronica Sinica, 2024, 52(4): 1144–1154. doi: 10.12263/DZXB.20230788.
    [10] LIU Junhua, ZHU Boxiang, WANG Fangxin, et al. CaV3: Cache-assisted viewport adaptive volumetric video streaming[C]. Proceedings of the 2023 IEEE Conference Virtual Reality and 3D User Interfaces (VR), Shanghai, China, 2023: 173–183. doi: 10.1109/VR55154.2023.00033.
    [11] LIU Shuquan, ZHANG Guanghui, XIAO Mengbai, et al. An intelligent prefetch strategy with multi-round cell enhancement in volumetric video streaming[C]. Proceedings of the 2024 21st Annual IEEE International Conference on Sensing, Communication, and Networking, Phoenix, USA, 2024: 1–9. doi: 10.1109/SECON64284.2024.10934826.
    [12] HAN Bo, LIU Yu, and QIAN Feng. ViVo: Visibility-aware mobile volumetric video streaming[C]. Proceedings of the 26th Annual International Conference on Mobile Computing and Networking, London, United Kingdom, 2020: 11. doi: 10.1145/3372224.3380888.
    [13] LI Jie, ZHANG Cong, LIU Zhi, et al. Optimal volumetric video streaming with hybrid saliency based tiling[J]. IEEE Transactions on Multimedia, 2023, 25: 2939–2953. doi: 10.1109/TMM.2022.3153208.
    [14] WANG Xi, LIU Wei, LIU Huitong, et al. Spatial perceptual quality aware adaptive volumetric video streaming[C]. IEEE Global Communications Conference (GLOBECOM), Kuala Lumpur, Malaysia, 2023: 1000–1005. doi: 10.1109/GLOBECOM54140.2023.10437209.
    [15] GÜL S, PODBORSKI D, BUCHHOLZ T, et al. Low-latency cloud-based volumetric video streaming using head motion prediction[C]. Proceedings of the 30th ACM Workshop on Network and Operating Systems Support for Digital Audio and Video, Istanbul, Turkey, 2020: 27–33. doi: 10.1145/3386290.3396933.
    [16] GÜL S, BOSSE S, PODBORSKI D, et al. Kalman filter-based head motion prediction for cloud-based mixed reality[C]. Proceedings of the 28th ACM International Conference on Multimedia, Seattle, USA, 2020: 3632–3641. doi: 10.1145/3394171.3413699.
    [17] YU Yong, SI Xiaosheng, HU Changhua, et al. A review of recurrent neural networks: LSTM cells and network architectures[J]. Neural Computation, 2019, 31(7): 1235–1270. doi: 10.1162/neco_a_01199.
    [18] LI Jie, WANG Huiyu, LIU Zhi, et al. Toward optimal real-time volumetric video streaming: A rolling optimization and deep reinforcement learning based approach[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2023, 33(12): 7870–7883. doi: 10.1109/TCSVT.2023.3277893.
    [19] HU Kaiyuan, YANG Haowen, JIN Yili, et al. Understanding user behavior in volumetric video watching: Dataset, analysis and prediction[C]. Proceedings of the 31st ACM International Conference on Multimedia, Ottawa, Canada, 2023: 1108–1116. doi: 10.1145/3581783.3613810.
    [20] WANG Lisha, LI Chenglin, DAI Wenrui, et al. QoE-driven and tile-based adaptive streaming for point clouds[C]. IEEE International Conference on Acoustics, Speech and Signal Processing, Toronto, Canada, 2021: 1930–1934. doi: 10.1109/ICASSP39728.2021.9414121.
    [21] SHI Yuang, CLEMENT B, and OOI W T. QV4: QoE-based viewpoint-aware V-PCC-encoded volumetric video streaming[C]. Proceedings of the 15th ACM Multimedia Systems Conference, Bari, Italy, 2024: 144–154. doi: 10.1145/3625468.3647619.
    [22] RACA D, LEAHY D, SREENAN C J, et al. Beyond throughput, the next generation: A 5G dataset with channel and context metrics[C]. Proceedings of the 11th ACM Multimedia Systems Conference, Istanbul, Turkey, 2020: 303–308. doi: 10.1145/3339825.3394938.
    [23] ZHANG Si, TONG Hanghang, XU Jiejun, et al. Graph convolutional networks: A comprehensive review[J]. Computational Social Networks, 2019, 6(1): 11. doi: 10.1186/s40649-019-0069-y.
    [24] LI Lihong, CHU Wei, LANGFORD J, et al. A contextual-bandit approach to personalized news article recommendation[C]. Proceedings of the 19th International Conference on World Wide Web, Raleigh, USA, 2010: 661–670. doi: 10.1145/1772690.1772758.
    [25] LU Yiyun, ZHU Yifei, and WANG Zhi. Personalized 360-degree video streaming: A meta-learning approach[C]. ACM Multimedia, Lisbon, Portugal, 2022: 1–10. doi: 10.1145/3503161.3548047.
    [26] WU Duo, WU Panlong, ZHANG Miao, et al. MANSY: Generalizing neural adaptive immersive video streaming with ensemble and representation learning[J]. IEEE Transactions on Mobile Computing, 2025, 24(3): 1654–1668. doi: 10.1109/TMC.2024.3487175.
  • 加载中
图(7) / 表(2)
计量
  • 文章访问数:  19
  • HTML全文浏览量:  4
  • PDF下载量:  0
  • 被引次数: 0
出版历程
  • 修回日期:  2026-04-08
  • 录用日期:  2026-04-08
  • 网络出版日期:  2026-04-25

目录

    /

    返回文章
    返回