Physiological Signal-Driven QoE Optimization for Wireless Virtual Reality Transmission
-
摘要: 虚拟现实(VR)流媒体传输中的突发分辨率变化会显著降低用户的体验质量(QoE),尤其是在从高分辨率向低分辨率切换的过程中。现有的QoE模型与传输方案未能充分解决这类变化对感知的影响。为弥补这一空白,该文提出一种创新的、生理信号驱动的QoE建模与优化框架,该框架充分利用了用户的脑电图(EEG)、心电图(ECG)以及皮肤电活动,能够精确捕捉VR流媒体传输中生理反应与分辨率变化的时间动态,从而实现对分辨率上升所带来收益以及分辨率下降所造成影响的准确量化。通过在一个深度强化学习(DRL)框架下将所提出的QoE模型集成到无线接入网络(RAN)中,该文实现了自适应传输策略,以动态分配无线资源,从而缓解短期信道波动,并根据用户移动性引发的信道变化调整帧分辨率。通过优先保证长期分辨率并尽量减少突发切换,该文所提出的方案相较于基线方案实现了88.7%的分辨率提升,并使分辨率切换频率降低了81.0%。实验结果证明了该生理信号驱动策略的有效性,并凸显了边缘人工智能在沉浸式媒体服务中的应用前景。
-
关键词:
- 无线虚拟现实 /
- 生理信号 /
- 体验质量(QoE) /
- 深度强化学习(DRL)
Abstract:Objective Virtual Reality (VR) has emerged as a transformative medium for immersive digital experiences, driven by its ability to deliver high-resolution 360° video with ultra-low motion-to-photon latency. However, the dependence of VR applications on wireless transmission introduces significant challenges. The stringent requirements for uncompressed data rates exceeding 1 Gbps and latency thresholds below 20 ms impose severe demands on network infrastructure. In mobile scenarios, channel fluctuations and user mobility frequently compromise service continuity, leading to abrupt changes in video resolution. While traditional Quality of Service (QoS) metrics such as bandwidth, jitter, and packet loss provide necessary network insights, they fail to adequately capture the user’s subjective satisfaction. Current Quality of Experience (QoE) models and adaptive bitrate (ABR) algorithms often rely on symmetric metrics like Mean Opinion Score (MOS), neglecting the psychological reality that users perceive quality degradation differently than quality improvement. Specifically, the negative impact of sudden resolution down-switching on user immersion is disproportionately larger than the positive impact of up-switching, a phenomenon rooted in behavioral psychology but largely overlooked in existing transmission schemes. Furthermore, the structural separation between Radio Access Network (RAN) resource provisioning and application-layer bitrate adaptation often results in mismatched optimization, leading to video quality oscillation and resource underutilization. To bridge these gaps, this study aims to establish a quantifiable link between physiological reactions and resolution changes, and subsequently utilize this insight to drive a joint optimization framework. The primary objective is to develop a novel, physiological signal-driven QoE framework integrated with deep reinforcement learning (DRL) to achieve adaptive transmission that maximizes user immersion while mitigating the adverse effects of resolution fluctuation in resource-constrained wireless networks. Methods A two-phase methodology is adopted, comprising physiological signal analysis and joint optimization framework development. A rigorous experiment quantified VR resolution changes’ impact on human perception. Nineteen healthy subjects participated in a viewing task using an eye-tracking VR headset combined with a 32-channel wireless EEG system and GSR sensors. Subjects viewed natural scene videos where resolution levels (8K, 4K, 1080P, 720P, 480P) switched randomly every 8 seconds. Collected EEG signals were preprocessed via independent component analysis and band-pass filtering. Event-Related Potentials were analyzed, focusing on the N200 component in temporal and occipital regions, reflecting visual processing and attention allocation. A Linear Discriminant Analysis classifier distinguished response types. The analysis investigated asymmetry between resolution upgrading and downgrading, as well as sensitivity to resolution jump magnitude. Based on physiological signal findings, a novel QoE model was formulated, introducing penalty terms for resolution degradation and large-scale switching, weighing them heavier than upgrade rewards to mathematically represent user aversion to quality drops. This physiological signal-driven QoE model was integrated into a mobile edge computing environment via a dual-timescale Deep Reinforcement Learning framework. The architecture decouples control into two cooperative agents: the Scheduling & Utility (SU) agent and the Resolution Scaling (RS) agent. The SU agent operates on a fast millisecond timescale, responsible for real-time wireless resource allocation. It utilizes Gated Recurrent Units to extract temporal features from channel state information and transmission history, dynamically allocating bandwidth to maximize frame delivery success rates and ensure fairness, respecting VR frame deadlines. The RS agent operates on a slower frame-level timescale, determining optimal resolution for subsequent video frames. Its decision-making is guided by the physiological signal-driven reward function, penalizing actions triggering negative physiological responses (e.g., sharp resolution drops) unless mandated by channel deterioration. Proximal Policy Optimization was selected as the optimization algorithm for both agents due to its stability in continuous and discrete action spaces. Extensive simulations used a 3GPP-based wireless channel module, incorporating user mobility, shadow fading, and path loss to create a realistic dynamic network environment. Results and Discussions The experimental results from the physiological study and the subsequent network simulations validate both the theoretical premise and the practical efficacy of the proposed framework. In the physiological signal analysis, significant N200 ERP components were observed approximately 200 ms after resolution changes. Crucially, the amplitude of the N200 response was statistically significantly larger (p<0.001) during resolution downgrades compared to upgrades, confirming the hypothesis that users are more sensitive to quality deterioration. Furthermore, “large jumps” in resolution (e.g., 8K to 1080P) triggered stronger neural responses and more concentrated energy in the occipital region compared to minor adjustments. The GSR data corroborated these findings, with a dual-path model achieving an average Area Under Curve (AUC) of 78.10% in classifying user arousal states related to quality flux. In the network performance evaluation, the proposed physiological signal-driven DRL framework was compared against several baseline schemes, including proportional-fair (PF) scheduling, equal resource allocation, and traditional congestion control mechanisms (e.g., SCReAM). The training curves demonstrated that the dual-agent system successfully converges, learning to coordinate resource provisioning with resolution decisions. The SU agent effectively smoothed short-term channel fluctuations, providing a stable capacity foundation that allowed the RS agent to make more informed resolution choices. Quantitative analysis revealed that the proposed scheme achieved an 88.7% improvement in average video resolution versus the equal-resource baseline. More importantly, resolution switching frequency was reduced by 81.0%. This reduction is critical, as frequent switching, especially downward switching, induces user discomfort, as demonstrated by the physiological signal study. By prioritizing long-term resolution stability and penalizing abrupt drops via the physiological signal-driven reward function, the system avoided the “ping-pong” effect in traditional ABR algorithms. When compared to schemes with different penalty weights, the proposed method demonstrated optimal balance: it avoided high-penalty conservatism (lower average resolution) and low-penalty instability (jagged visual experiences). The joint optimization allocated resources preferentially to users with critical frame deadlines or perceivable quality drop risks, maintaining a frame delivery success rate exceeding 99%. Conclusions This paper addresses the fundamental conflict between the instability of wireless channels and the human need for seamless visual consistency in Virtual Reality streaming. By pioneering a physiological signal-driven approach, the asymmetric impact of resolution changes on user experience was successfully quantified, challenging the symmetric assumptions of traditional QoE models. The integration of this physiological insight into a dual-timescale DRL framework enables the Radio Access Network to transcend simple throughput optimization. Instead, it allows for a synergistic operation where wireless resource allocation facilitates stable application-layer adaptation, and application demands guide resource scheduling. The proposed solution significantly enhances the immersive experience by maximizing average resolution while minimizing the physiologically disruptive effects of sudden quality degradation. The reduction in resolution switching frequency by over 80% demonstrates the system's capability to shield users from network variability. This research highlights the vast potential of Edge AI in determining resource allocation based on human perception rather than solely on network statistics. Future work suggests expanding the QoE model to include multi-sensory factors such as motion-to-photon latency and cyber-sickness, as well as addressing individual differences in sensitivity through personalized models. Additionally, the deployment of such physiological signal-driven frameworks in real-world networks must address privacy concerns, potentially through Federated Learning, to process biometric data locally while optimizing global network policies. Ultimately, this work provides a blueprint for human-centric immersive networking, shifting the focus from Quality of Service to a physiologically validated Quality of Experience. -
[1] YAQOOB A, BI Ting, and MUNTEAN G M. A survey on adaptive 360° video streaming: Solutions, challenges and opportunities[J]. IEEE Communications Surveys & Tutorials, 2020, 22(4): 2801–2838. doi: 10.1109/COMST.2020.3006999. [2] CHEN Yuang, LU Hancheng, QIN Langtian, et al. Streaming 360° VR video with statistical QoS provisioning in mmWave networks from delay and rate perspectives[J]. IEEE Transactions on Wireless Communications, 2025, 24(6): 4721–4737. doi: 10.1109/TWC.2025.3543615. [3] WEN Wen, LI Mu, YAO Yiru, et al. Perceptual quality assessment of virtual reality videos in the wild[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2024, 34(9): 8368–8381. doi: 10.1109/TCSVT.2024.3378352. [4] KOUGIOUMTZIDIS G, POULKOV V K, LAZARIDIS P I, et al. Deep reinforcement learning-based resource allocation for QoE enhancement in wireless VR communications[J]. IEEE Access, 2025, 13: 25045–25058. doi: 10.1109/ACCESS.2025.3538546. [5] GAO Nianzhen, ZHOU Jiaxi, WAN Guoan, et al. Low-latency VR video processing-transmitting system based on edge computing[J]. IEEE Transactions on Broadcasting, 2024, 70(3): 862–871. doi: 10.1109/TBC.2024.3380455. [6] 兰诚栋, 饶迎节, 宋彩霞, 等. 基于强化学习的立体全景视频自适应流[J]. 电子与信息学报, 2022, 44(4): 1461–1468. doi: 10.11999/JEIT200908.LAN Chengdong, RAO Yingjie, SONG Caixia, et al. Adaptive streaming of stereoscopic panoramic video based on reinforcement learning[J]. Journal of Electronics & Information Technology, 2022, 44(4): 1461–1468. doi: 10.11999/JEIT200908. [7] FIEDLER M, ZEPERNICK H J, and KELKKANEN V. Network-induced temporal disturbances in virtual reality applications[C]. 2019 Eleventh International Conference on Quality of Multimedia Experience, Berlin, Germany, 2019: 1–3. doi: 10.1109/QoMEX.2019.8743304. [8] ZHANG Jiayi, BLANDINO S, VARSHNEY N, et al. Multi-user MIMO enabled virtual reality in IEEE 802.11ay WLAN[C]. 2022 IEEE Wireless Communications and Networking Conference, Austin, USA, 2022: 2595–2600. doi: 10.1109/WCNC51071.2022.9771778. [9] CHAKARESKI J, KHAN M, ROPITAULT T, et al. 6DOF virtual reality dataset and performance evaluation of millimeter wave vs. free-space-optical indoor communications systems for lifelike mobile VR streaming[C]. 2020 54th Asilomar Conference on Signals, Systems, and Computers, Pacific Grove, USA, 2020: 1051–1058. doi: 10.1109/IEEECONF51394.2020.9443328. [10] ANWAR M S, WANG Jing, ULLAH A, et al. Measuring quality of experience for 360-degree videos in virtual reality[J]. Science China Information Sciences, 2020, 63(10): 202301. doi: 10.1007/s11432-019-2734-y. [11] FEI Zesong, WANG Fei, WANG Jing, et al. QoE evaluation methods for 360-degree VR video transmission[J]. IEEE Journal of Selected Topics in Signal Processing, 2020, 14(1): 78–88. doi: 10.1109/JSTSP.2019.2956631. [12] ZUO Xutong, YANG Jiayu, WANG Mowei, et al. Adaptive bitrate with user-level QoE preference for video streaming[C]. IEEE Conference on Computer Communications, London, United Kingdom, 2022: 1279–1288. doi: 10.1109/INFOCOM48880.2022.9796953. [13] JOHANSSON I. Self-clocked rate adaptation for conversational video in LTE[C]. Proceedings of the 2014 ACM SIGCOMM Workshop on Capacity Sharing Workshop, Chicago, USA, 2014: 51–56. doi: 10.1145/2630088.2631976. [14] MAURA F, CASASNOVAS M, and BELLALTA B. Experimenting with adaptive bitrate algorithms for virtual reality streaming over Wi-Fi[C]. Proceedings of the 30th Annual International Conference on Mobile Computing and Networking, Washington, USA, 2024: 1930–1937. doi: 10.1145/3636534.3697322. [15] BAMPIS C G, LI Zhi, KATSAVOUNIDIS I, et al. Towards perceptually optimized adaptive video streaming-a realistic quality of experience database[J]. IEEE Transactions on Image Processing, 2021, 30: 5182–5197. doi: 10.1109/TIP.2021.3073294. [16] AGARWAL B, TOGOU M A, RUFFINI M, et al. QoE-driven optimization in 5G O-RAN-enabled HetNets for enhanced video service quality[J]. IEEE Communications Magazine, 2023, 61(1): 56–62. doi: 10.1109/MCOM.003.2200229. [17] 3GPP. 38.214 Physical layer procedures for data[S] 3rd Generation Partnership Project (3GPP), 2025. (查阅网上资料, 未找到出版地信息, 请补充)(查阅网上资料, 未能确认本条文献修改是否正确, 请确认). [18] BENTALEB A, TAANI B, BEGEN A C, et al. A survey on bitrate adaptation schemes for streaming media over HTTP[J]. IEEE Communications Surveys & Tutorials, 2019, 21(1): 562–585. doi: 10.1109/COMST.2018.2862938. [19] 曾焕强, 孔庆玮, 陈婧, 等. 沉浸式视频编码技术综述[J]. 电子与信息学报, 2024, 46(2): 602–614. doi: 10.11999/JEIT230097.ZENG Huanqiang, KONG Qingwei, CHEN Jing, et al. Overview of immersive video coding[J]. Journal of Electronics & Information Technology, 2024, 46(2): 602–614. doi: 10.11999/JEIT230097. [20] YEZNABAD Y F, HELFERT M, and MUNTEAN G M. QoE-driven cross-layer bitrate allocation approach for MEC-supported adaptive video streaming[J]. IEEE Transactions on Network and Service Management, 2024, 21(6): 6857–6874. doi: 10.1109/TNSM.2024.3453992. [21] ABDUL KADER L, AL-SHARGIE F, TARIQ U, et al. One-channel wearable mental stress state monitoring system[J]. Sensors, 2024, 24(16): 5373. doi: 10.3390/s24165373. [22] VIDAURRE C, KRÄMER N, BLANKERTZ B, et al. Time domain parameters as a feature for EEG-based brain–computer interfaces[J]. Neural Networks, 2009, 22(9): 1313–1319. doi: 10.1016/j.neunet.2009.07.020. [23] BLANKERTZ B, LEMM S, TREDER M, et al. Single-trial analysis and classification of ERP components—A tutorial[J]. NeuroImage, 2011, 56(2): 814–825. doi: 10.1016/j.neuroimage.2010.06.048. [24] SCHULMAN J, WOLSKI F, DHARIWAL P, et al. Proximal policy optimization algorithms[J]. arXiv: 1707.06347, 2017. doi: 10.48550/arXiv.1707.06347. -
下载:
下载: