Resource Allocation for RIS-aided Cross-Model Communications
-
摘要: 针对视频和触觉业务共存的跨模态业务场景,该文构建了可重构智能表面(RIS)辅助的共存网络切片系统,用以提高视频业务和触觉业务的传输速率和可靠性。同时,为了有效降低触觉业务通过穿孔带给视频业务的资源损耗,提出了动态被动波束赋形方案,允许RIS在不同时隙进行动态调整。基于上述方案,该文在确保触觉业务传输的时延和可靠性满足约束的同时,构建最大化视频业务传输速率的优化问题,以满足跨模态业务共存需求,实现资源的合理分配。为求解此优化问题,该文将其建模为一个马尔可夫决策过程(MDP),通过深度确定性策略梯度(DDPG)算法来进行视频数据和触觉数据传输资源的联合优化。仿真结果显示,与现有方案相比,所提方案具有一定的优越性,在保证传输触觉业务可靠性的前提下,提高了约66.67%的视频业务和速率。Abstract:
Objective The rapid development of digital and intelligent technologies has driven the increasing demand for cross-modal communication systems to support a wide range of applications, such as high-bandwidth video streaming, ultra-reliable low-latency haptic interactions, and immersive virtual reality experiences. These applications require the concurrent transmission of heterogeneous services, each with distinct and often conflicting resource demands. For instance, video services necessitate high data rates and large bandwidth allocations for smooth playback, while haptic services require ultra-low latency (<0.3 ms) and high reliability (>99.999%) for real-time interaction. Existing resource allocation schemes, typically designed for single-service scenarios or static optimization, do not effectively address the dynamic nature of wireless channels or the stringent requirements of multi-service coexistence. This paper proposes a dynamic resource allocation framework that utilizes Reconfigurable Intelligent Surfaces (RIS) to optimize the transmission efficiency of video services and the reliability of haptic services, thereby enhancing spectrum utilization and improving the Quality of Experience (QoE) in cross-modal communication systems. Methods To address the resource competition between video and haptic services, this paper proposes an RIS-aided network slicing architecture. The RIS dynamically adjusts its phase shifts to reshape the wireless propagation environment, improving channel gain and reducing interference. A puncturing-based resource sharing mechanism is introduced, enabling haptic traffic to temporarily use resources allocated to video services during burst arrivals. This mechanism ensures the stringent latency and reliability requirements of haptic services are met without significantly affecting video service performance. The optimization problem is formulated as a Mixed-Integer Nonlinear Programming (MINLP) task, with the objective of maximizing the video service rate while satisfying the constraints of haptic services. To tackle the complexity of joint RIS phase optimization and resource allocation, the problem is modeled as a Markov Decision Process (MDP) with continuous state and action spaces. A Deep Deterministic Policy Gradient (DDPG) algorithm is employed, integrating actor-critic networks, experience replay, and target networks to learn optimal policies. The actor network generates decisions regarding resource block allocation, RIS phase shifts, and puncturing ratios, while the critic network evaluates the long-term reward, defined as the weighted sum of video throughput and haptic service satisfaction. Results and Discussions Simulation results demonstrate the effectiveness of the proposed scheme. Compared to the HMSA scheme, the proposed method significantly improves the total transmission rate for users, particularly under varying Base Station (BS) power levels ( Fig. 4 ). The RIS phase optimization scheme outperforms both the random phase and no-RIS scenarios, highlighting the importance of dynamically adjusting RIS reflection coefficients to enhance channel gain (Fig. 5 ). Furthermore, the average delay of haptic data packets decreases as the number of RIS reflection units increases, and higher BS transmit power further reduces latency, confirming the synergy between RIS deployment and power allocation (Fig. 6 ). The user sum rate declines as the arrival rate of haptic data packets increases, due to intensified resource competition. However, deploying additional RIS reflection units mitigates this degradation, demonstrating the robustness of RIS-aided resource allocation (Fig. 7 ). The convergence behavior of the DDPG algorithm is analyzed, showing faster convergence in low-SNR environments (e.g., P = 0 dBm) compared to high-SNR scenarios (e.g., P = 30 dBm), where reward fluctuations are more pronounced (Fig. 8 ). Additionally, the learning rate is identified as a key hyperparameter, with a value of 0.001 providing the optimal balance between convergence speed and stability (Fig. 9 ). These results confirm that the proposed framework enhances video service throughput while ensuring the stringent reliability and low-latency requirements of haptic services, enabling efficient cross-modal resource coexistence.Conclusions This work presents an RIS-assisted dynamic resource allocation framework for cross-modal communication systems, effectively addressing the coexistence challenges of video and haptic services. Key innovations include the integration of RIS phase optimization with puncturing-based resource sharing and the application of DDPG to solve high-dimensional MINLP problems. The proposed scheme significantly enhances video throughput and haptic reliability, demonstrating its potential for 6G-enabled immersive applications. Future research will extend this framework to mobile user scenarios, multi-RIS collaborative systems, and multi-service coexistence environments with diverse QoS requirements. Specifically, the study will examine the impact of user mobility on RIS configuration and resource allocation strategies. Additionally, the benefits of deploying multiple RIS units in a coordinated manner will be explored to further enhance system performance and coverage. Finally, the framework will be expanded to support a broader range of services with varying latency, reliability, and bandwidth demands, paving the way for more versatile and efficient cross-modal communication systems. -
1. 引言
紫外(UltraViolet, UV)波长在200~280 nm的波段,由于臭氧分子的强烈吸收作用,使得该波段的紫外光在近地面几乎衰减为零,称该波段为“日盲区”[1]。无线紫外光通信主要利用该波段的紫外光作为载体,通过大气分子、气溶胶等微粒的散射作用进行信息的传递[2],具有非直视(Non-Line-Of-Sight, NLOS)、高保密性、低窃听率、抗干扰能力强等优点,特别适用于无人机编队飞行、海军舰艇编队、陆军装甲编队、导弹车车队在无线电静默和复杂环境下的军事隐秘通信[3,4]。但由于紫外光的散射通信以及大气湍流的偏折作用,大气信道衰减严重、接收信号微弱、具有高路径损耗等,使得紫外光通信传输距离有限[5]。因此,通过多跳中继方式来延长无线紫外光通信距离成为研究的热点。
目前已有学者对紫外光多跳中继通信进行了相关研究,文献[6]在紫外光通信网中通过多跳中继方式增加了通信范围,并节省了功率消耗。文献[7]研究了紫外光多跳中继通信中节点连通问题,适当调整节点密度、发射功率和数据速率,使得紫外光多跳中继通信网络中中继孤立节点存在的概率趋近于零。文献[8]针对紫外光通信中光源器件发射功率受限和大气信道衰减严重造成传输距离近的问题,提出采用多跳中继方式构建紫外光长距离通信链路,提高了系统功率利用率。文献[9]研究结果表明,选择合适的无线紫外光通信系统配置对提高多跳中继通信系统性能是至关重要的。
上述研究都是建立在节点等距离分布下的性能研究,而对于紫外光多跳中继通信系统中节点随机分布的最优跳数问题研究较少。文献[10]为减少无线传感器网络节点的能量消耗,使用紫外光作为信息载体,在等距离分布下得出了使能量消耗最低的最优跳数表达式。但研究未考虑紫外光通信系统配置问题,而紫外光通信的整体性能高度依赖于系统配置,特别是收发仰角[9]。因此,本文基于无线紫外光非直视单次散射模型,研究了车队中车辆在随机分布状态下无线紫外光多跳中继通信的收发仰角与最优跳数的关系,根据信道容量和路径损耗得出两者之间的近似表达式,并分析了不同收发仰角下的系统性能。
2. 无线紫外光多跳中继通信系统模型
2.1 无线紫外光非直视单次散射模型分析
由于大气对紫外光强散射作用,使紫外光通信信道能够构成非直视通信链路。实际中紫外光通信是多次散射的,但在短距离通信中,单次散射传输为主[11],因此本文以单次散射模型作为研究基础。紫外光单次散射模型如图1所示[12],βT是发射仰角,βR是接收仰角,θT是发散角,θR是接收视场角,V是有效散射体,θS是散射角,r是收发端基线距离。
发射端TX以βT和θT的角度向空间发射光信号,光信号在有效散射体V内散射后,接收端RX以βR和θR的角度进行光信号接收,则紫外光单次散射通信的接收光功率为[12]
Pr=PtArKsPsθRθ2Tsin(βT+βR)32π3rsinβT(1−cosθT2)⋅exp[−Ker(sinβT+sinβR)sin(βT+βR)] (1) 其中,Pt是发射光功率,Ar是接收孔径面积,Ks是大气散射系数,Ps是散射角θs的相函数,Ke是大气信道衰减系数,且Ke=Ka+Ks, Ka是大气吸收系数。
非直视紫外光散射通信的路径损耗可表示为发射光功率与接收光功率的比值,如式(2)所示
L=PtPr=32π3rsinβT(1−cosθT2)ArKsPsθRθ2Tsin(βT+βR)⋅exp[Ker(sinβT+sinβR)sin(βT+βR)] (2) 式(2)为路径损耗计算表达式,只适用于收发仰角较小的状态,当收发仰角较大时,不再适用于分析非直视紫外光散射通信。而在实际应用中当通信距离小于1 km时,通常采用路径损耗简化公式[13]
L=ξrα (3) 其中,
ξ 是路径损耗因子,α 是路径损耗指数,其都是与收发仰角有关的参数。2.2 无线紫外光多跳中继节点随机分布结构模型
本文研究的紫外光多跳通信结构模型,考虑1维N跳网络结构且每跳的距离是随机性的,其无线紫外光多跳中继节点随机分布模型如图2所示,当
r1=r2=···=rN=d/N 时,即为等距离分布。紫外光多跳中继通信系统有一个源节点S,一个目的节点D, N–1个中继节点Ri (i=1, 2, ···, N–1),且Ri随机分布在S到D的距离范围内。假设所有节点采用半双工通信方式,考虑经典N跳时分解码转发协议,每个中继节点Ri接收Ri–1发送的信息,并解码转发到Ri+1节点。假设中继节点位置是独立随机变量,并设Xi是第i个中继节点在S到D间的随机位置,则Xi的概率密度函数为[14]
f(Xi)={1rδ, r(i−δ2)≤Xi≤r(i+δ2)0,其他 (4) 其中,r=d/N是等分距离,d是S到D的实际距离,
δ∈[0,1] 是随机偏移范围,其表征中继节点位置的随机性或者不确定性。定义
ri=Xi−Xi−1(i=2,3,···,N−1) ,特别地,r1=X1 ,rN=d−XN−1 ,则ri的累积分布函数表示为对于i=1或N时,
F(ri)={0,ri≤r(1−δ2)1rδ(ri−r(1−δ2)), r(1−δ2)≤ri≤r(1+δ2)1,ri≥r(1+δ2) (5) 对于i=2, 3, ···, N–1时,
F(ri)={0,ri≤r(1−δ)12(1rδ)2[ri−r(1−δ)]2, r(1−δ)≤ri≤r12(1rδ)2[r(1+δ)−ri]2, r≤ri≤r(1+δ)1,ri≥r(1+δ) (6) 3. 无线紫外光多跳中继节点随机分布最优跳数分析
根据紫外光NLOS单次散射模型,其紫外光通信在量子极限条件下的信噪比为[12]
γSNR=ηfηrPtλ2hcBL=μξ−1r−α (7) 其中,
μ=ηfηrPtλ2hcB ,ηf 和ηr 分别表示滤光片透过率和光电倍增管(Photo Multiplier Tube, PMT)探测效率,λ 是紫外波长,h是普朗克常数,c是光速,B=Kec2π 是紫外光通信信道带宽。利用香农公式可以计算出紫外光NLOS通信的信道容量[12],则紫外光单跳通信的频谱效率为
ηsh=log2(1+μξ−1d−α) (8) 在源节点到目的节点等距离分布N跳通信传输时,假设在任何时间点上只有一个节点进行传输,即在接收端无干扰,且每个节点在1/N时间传输相同的信息量,则每跳频谱效率是单跳通信的N倍,其等距离分布的频谱效率为
ηeq=1Nlog2(1+μξ−1(dN)−α) (9) 从式(9)可以看出频谱效率与跳数和收发仰角有关,因此研究使用近似理想路径路由计算方法[15],通过最大化频谱效率得到紫外光多跳中继通信系统节点等距离分布下的最优跳数近似表达式为
Nop=argmaxηeq≈[(2ε−1γ)1/α]+ (10) 其中,
ε=α+w(−αe−α)ln2 是仅取决于路径损耗指数的常数,w(⋅) 是郎伯W函数的主分支[16],γ=μξ−1d−α 是单跳情况下的接收信噪比。从式(10)可以看出最优跳数受到收发仰角的影响,在基于等距离分布的分析下,对中继节点随机分布时最优跳数进行了分析。在节点随机分布时,系统性能将取决于N跳中最长的一跳[17],则在此情况下频谱效率度量将是最长一跳距离上的平均值,如式(11)所示
¯η=Ermax[1Nlog2(1+μξ−1r−αmax)] (11) 其中,
rmax=maxi=1,⋯,Nri ,E[⋅] 是期望算子。为了求解节点随机分布下最优跳数值,需要求得
¯η 的闭合表达式,由式(5)和式(6)的累积分布函数可得r≤rmax≤r(1+δ) (12) 对式(11)应用Jensen不等式
E[f(x)]≥f(E[x]) ,则式(11)变换如式(13)所示E[log2(1+μξ−1r−αmax)]≥log2(1+μξ−1(E[rmax])−α) (13) 根据式(13),联合式(11)和式(12)求解得到
¯η 的上下界1Nlog2(1+μξ−1(dN(1+δ))−α)≤¯η≤1Nlog2(1+μξ−1(dN)−α) (14) 根据等距离分布最优跳数近似值的分析过程,求解式(14)的
¯η 下界最大化,即节点随机分布的最优跳数近似求解式如式(15)所示N′op≈[(2ε−1γ′)1/α]+ (15) 其中,
γ′=μξ−1d−α(1+δ)−α 是最大距离单跳情况下节点随机分布的接收信噪比。在节点随机分布情况下,通过式(15)求解得到的最优跳数值是源节点到目的节点多跳中继通信中的最大值。当
δ=0 时,式(15)等于式(10),最优跳数值达到最小值,即紫外光通信链路的频谱效率有最大值。4. 仿真结果及分析
4.1 仿真实验参数设置
通常在车队中车辆与车辆之间的距离根据车速应保持在30~100 m,而整个车队的最大距离不超过1~3 km[18]。基于此参数规定和最优跳数的分析,对节点等距离分布和随机分布的两种情况进行了仿真分析。其仿真实验的主要参数见表1。
表 1 系统主要仿真参数参数 数值 紫外波长 260 nm PMT探测效率 0.3 滤光片透过率 0.6 吸收系数 0.802×10–3 m–1 米氏散射系数 0.284×10–3 m–1 瑞利散射系数 0.266×10–3 m–1 普朗克常数h 6.6×10–34 4.2 节点等距离分布最优跳数仿真分析
在车队中车辆等间隔匀速运动时,通常满足节点等距离分布状态,即
δ=0 。根据不同收发仰角下ξ 和α 的取值分析[14],在Pt=30 mW, d=500 m和等距离分布情况下,本文对βT<βR, βT=βR和βT>βR 3种不同状态的收发仰角的单跳通信、多跳通信和最优跳通信进行了仿真,在d=500 m的多跳通信中,设定最大通信跳数为9跳,即车队有10辆车辆,其他仿真参数如表1所示,分析了3种通信方式的性能。从图3、图4和图5可以看出,当发射功率大于18 mW时,针对不同的收发仰角求得的最优跳数值都要比单跳通信和9跳通信获得较好的通信传输能力。另外,对比分析了相同最优跳数值下的不同收发仰角的频谱效率。从图6(a)和图6(b)可以看出,在相同最优跳数值下,紫外光通信的收发仰角对通信传输能力有着较大的影响。并且在紫外光多跳中继通信系统中,当有相同跳数时,使用小发射仰角和大接收仰角,能够使紫外光多跳通信系统获得更好的通信性能。
基于等距离分布模型的分析结果,将本文等距离分布下的最大频谱效率计算方法与最优能量计算方法进行对比分析。在βT<βR的紫外光几何结构配置下,不同方法下最优跳数和通信传输能力的对比情况如图7和图8所示。从图7可以看出两种方法的效果能够基本达到一致。当发射仰角固定时,增大接收仰角角度,其最优跳数随着通信距离的增加,最大频谱效率计算方法比最优能量计算方法的跳数值多一个数量级。
图8是根据图7中d=600 m时的最优跳数的分界点值的性能对比。此时,最大频谱效率计算方法最优跳数为6,最优能量计算方法最优跳数为5。从图8可以看出,当采用小功率传输时,最大频谱效率计算方法的通信传输能力优于最优能量计算方法,同时也达到节约功率的需求。在等距离分布模型下,当源节点到目的节点的通信距离确定时,根据无线紫外光通信的收发仰角得到适当的中继数,能够使整体通信传输能力达到最大。
4.3 节点随机分布最优跳数仿真分析
当车队在行驶中根据路况和调配信息,随时调整车速,此过程导致车队处于非匀速状态,不再满足等距离分布。本节对节点随机分布下的最优跳数进行了仿真分析。主要仿真参数如表1所示,在发射功率Pt=30 mW, S到D的距离d=500 m下,对比分析了不同随机偏移范围下的不同收发仰角的频谱效率与跳数的关系。从图9(a)、图9(b)、图9(c)和图9(d)可以看出,在紫外光多跳通信系统中,随着
δ 的增加系统总体性能呈下降趋势。并且不同收发仰角在不同随机偏移范围下,都存在一个使总体系统的频谱效率下界达到最大值的最优跳数值。通过图9进一步可以看出在紫外光中继通信系统中,使用小发射仰角和大接收仰角系统配置,能够使紫外光多跳通信系统获得更好的传输能力。为进一步说明随机分布模型的实际应用,本文在Pt=30 mW和
δ=0.2 下,对比分析了βT<βR和βT>βR下的收发端距离变化的频谱效率与跳数的关系,如图10所示。从图10可以看出,频谱效率随着跳数的增加存在最大值,并且随着S到D距离的增加,当达到最优跳数时,频谱效率不再随着跳数的增加有较大的变化。进一步说明当紫外光长距离通信时,并不是跳数越多其通信性能就越好。5. 结束语
本文针对车队中车辆在等距离和随机分布状态下的最优跳数进行了研究。基于非直视紫外光单次散射模型、信道容量和路径损耗,依据使频谱效率最大化原则,得出计算收发仰角与频谱效率的近似关系表达式。通过分析等距离分布最优跳数,求得随机分布的最优跳数近似表达式。仿真结果表明,收发仰角影响着紫外光多跳中继通信系统的最优跳数。不同随机偏移范围和不同收发仰角都对应特定的最优跳数,与最优能量计算方法相比,最大频谱效率计算方法在发射功率小于25 mW时有更好的信息传输能力,并达到节约功率的需求。在紫外光长距离通信时,选取合适的中继数及小发射仰角和大接收仰角的几何结构配置,不仅能够提高车队间无线紫外光多跳中继通信系统的传输能力,也满足车队间保持稳定可靠的隐秘通信需求。
-
1 DDPG算法
初始化:s1,θa,θc,θ′a←θa和θ′c←θc,经验回放池N,随
机噪声Ntwhile 迭代回合≤最大迭代回合 do while t≤T do • 根据状态st和随机噪声Nt,通过actor网络计算动作
at=μ(st;θa)+Nt• 执行动作at,获得奖赏值r(st,at)和下一状态st+1 • 将经验(st,at,rt,st+1)存储至经验回放池N中 • 从经验回放池N中随机采样Nbatch个经验样本进行神经网
络训练• 通过式(26)的近似形式,计算得到当前训练critic网络的损
失函数• 通过损失函数L(θc)关于θc的梯度更新critic网络的参数 • 通过式(23)更新actor网络的参数θa • 使用式(29)和式(30)来更新目标actor网络和目标critic网络
的参数θ′a和θ′c• t←t+1 end while end while 表 1 仿真参数表
参数意义 设定数值 资源块RB总数K 200 时隙个数T 20 一个时隙的持续时间 1 ms 一个微小时隙的持续时间Δ 0.125 ms 一个时隙内微小时隙个数M 8 RB的频率带宽B 180 kHz 触觉数据包到达速率λ 3 触觉数据包的大小Dm,tl 20 Byte 高斯随机噪声功率δ2 –93 dBm 触觉数据包的解码错误概率εl 10−6 -
[1] 李玉宏, 张朋, 金帝, 等. 应用对未来网络的需求与挑战[J]. 电信科学, 2019, 35(8): 2019203. doi: 10.11959/j.issn.1000-0801.2019203.LI Yuhong, ZHANG Peng, JIN Di, et al. Application's needs and challenges for future networks[J]. Telecommunications Science, 2019, 35(8): 2019203. doi: 10.11959/j.issn.1000-0801.2019203. [2] WEI Xin, WU Dan, ZHOU Liang, et al. Cross-modal communication technology: A survey[J]. Fundamental Research, 2023. doi: 10.1016/j.fmre.2023.08.002. [3] WEI Xin, ZHANG Meng, and ZHOU Liang. Cross-modal transmission strategy[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2022, 32(6): 3991–4003. doi: 10.1109/TCSVT.2021.3105130. [4] 陈鸣锴, 柳明浩, 王文俊, 等. 面向6G的跨模态语义编解码技术[J]. 信号处理, 2023, 39(7): 1141–1154. doi: 10.16798/j.issn.1003-0530.2023.07.001.CHEN Mingkai, LIU Minghao, WANG Wenjun, et al. Codec for cross-modal semantic communication in 6G[J]. Journal of Signal Processing, 2023, 39(7): 1141–1154. doi: 10.16798/j.issn.1003-0530.2023.07.001. [5] ALTAF KHATTAK S B, NASRALLA M M, and REHMAN I U. The role of 6G networks in enabling future smart health services and applications[C]. Proceedings of 2022 IEEE International Smart Cities Conference, Pafos, Cyprus, 2022: 1–7. doi: 10.1109/ISC255366.2022.9922093. [6] 李昂, 陈建新, 魏昕, 等. 面向6G的跨模态信号重建技术[J]. 通信学报, 2022, 43(6): 28–40. doi: 10.11959/j.issn.1000-436x.2022093.LI Ang, CHEN Jianxin, WEI Xin, et al. 6G-oriented cross-modal signal reconstruction technology[J]. Journal on Communications, 2022, 43(6): 28–40. doi: 10.11959/j.issn.1000-436x.2022093. [7] ZHOU Liang, WU Dan, CHEN Jianxin, et al. Cross-modal collaborative communications[J]. IEEE Wireless Communications, 2020, 27(2): 112–117. doi: 10.1109/MWC.001.1900201. [8] STEINBACH E, STRESE M, EID M, et al. Haptic codecs for the tactile internet[J]. Proceedings of the IEEE, 2019, 107(2): 447–470. doi: 10.1109/JPROC.2018.2867835. [9] ALSENWI M, TRAN N H, BENNIS M, et al. eMBB-URLLC resource slicing: A risk-sensitive approach[J]. IEEE Communications Letters, 2019, 23(4): 740–743. doi: 10.1109/LCOMM.2019.2900044. [10] SUN Haipeng, YANG Jin, SU Junhao, et al. Joint resource scheduling for coexistence of URLLC and eMBB in 5G wireless networks[C]. Proceedings of 2021 Computing, Communications and IoT Applications, Shenzhen, China, 2021: 53–58. doi: 10.1109/ComComAp53641.2021.9653121. [11] ZHAO Yunzhi, CHI Xuefen, QIAN Lei, et al. Resource allocation and slicing puncture in cellular networks with eMBB and URLLC terminals coexistence[J]. IEEE Internet of Things Journal, 2022, 9(19): 18431–18444. doi: 10.1109/JIOT.2022.3160647. [12] REN Rong, WANG Jie, YU Jingming, et al. Hybrid puncturing and superposition scheme for multiplexing uRLLC and eMBB services based on deep reinforcement learning[C]. Proceedings of the 2022 IEEE 8th International Conference on Computer and Communications, Chengdu, China, 2022: 806–810. doi: 10.1109/ICCC56324.2022.10065784. [13] GUO Jiangfeng, NIE Gaofeng, TIAN Hui, et al. Puncture-predictive fairness scheduling scheme for eMBB and URLLC based on TD3 algorithm[C]. Proceedings of 2023 IEEE/CIC International Conference on Communications in China, Dalian, China, 2023: 1–6. doi: 10.1109/ICCC57788.2023.10233289. [14] ZHUANSUN Chenlu, YAN Kedong, ZHANG Gongxuan, et al. Hypergraph-based joint channel and power resource allocation for cross-cell M2M communication in IIoT[J]. IEEE Internet of Things Journal, 2023, 10(17): 15350–15361. doi: 10.1109/JIOT.2023.3263567. [15] WANG Lei, YIN Anmin, JIANG Xue, et al. Resource allocation for multi-traffic in cross-modal communications[J]. IEEE Transactions on Network and Service Management, 2023, 20(1): 60–72. doi: 10.1109/TNSM.2022.3207776. [16] 文梦甜. 跨模态通信中传输策略优化研究[D]. [硕士论文], 南京邮电大学, 2023. doi: 10.27251/d.cnki.gnjdc.2023.001196.WEN Mengtian. Research on optimization of transmission strategy in cross-modal communications[D]. [Master dissertation], Nanjing University of Posts and Telecommunications, 2023. doi: 10.27251/d.cnki.gnjdc.2023.001196. [17] WU Qingqing and ZHANG Rui. Towards smart and reconfigurable environment: Intelligent reflecting surface aided wireless network[J]. IEEE Communications Magazine, 2020, 58(1): 106–112. doi: 10.1109/MCOM.001.1900107. [18] LIASKOS C, NIE Shuai, TSIOLIARIDOU A, et al. A new wireless communication paradigm through software-controlled metasurfaces[J]. IEEE Communications Magazine, 2018, 56(9): 162–169. doi: 10.1109/MCOM.2018.1700659. [19] DI RENZO M, DEBBAH M, PHAN-HUY D T, et al. Smart radio environments empowered by reconfigurable AI meta-surfaces: An idea whose time has come[J]. EURASIP Journal on Wireless Communications and Networking, 2019, 2019: 129. doi: 10.1186/s13638-019-1438-9. [20] GHANEM W R, JAMALI V, and SCHOBER R. Joint beamforming and phase shift optimization for multicell IRS-aided OFDMA-URLLC systems[C]. Proceedings of 2021 IEEE Wireless Communications and Networking Conference, Nanjing, China, 2021: 1–7. doi: 10.1109/WCNC49053.2021.9417582. [21] CAO Xuelin, YANG Bo, HUANG Chongwen, et al. Reconfigurable intelligent surface-assisted aerial-terrestrial communications via multi-task learning[J]. IEEE Journal on Selected Areas in Communications, 2021, 39(10): 3035–3050. doi: 10.1109/JSAC.2021.3088634. [22] MELGAREJO D C, KALALAS C, DE SENA A S, et al. Reconfigurable intelligent surface-aided grant-free access for uplink URLLC[C]. Proceedings of the 2020 2nd 6G Wireless Summit, Levi, Finland, 2020: 1–5. doi: 10.1109/6GSUMMIT49458.2020.9083788. [23] ALMEKHLAFI M, ARFAOUI M A, ELHATTAB M, et al. Joint resource allocation and phase shift optimization for RIS-aided eMBB/URLLC traffic multiplexing[J]. IEEE Transactions on Communications, 2022, 70(2): 1304–1319. doi: 10.1109/TCOMM.2021.3127265. [24] ZHOU Shuangquan, ZHANG Wenbin, XU Fanglei, et al. Energy-efficient resource allocation in DDPG-based integrated satellite-terrestrial network[C]. Proceedings of 2023 IEEE Globecom Workshops, Kuala Lumpur, Malaysia, 2023: 147–152. doi: 10.1109/GCWkshps58843.2023.10464487. [25] SUTTON R S, MCALLESTER D, SINGH S, et al. Policy gradient methods for reinforcement learning with function approximation[C]. Proceedings of the 13th International Conference on Neural Information Processing Systems, Denver CO, USA, 1999: 1057–1063. -