Joint Resource Management for Tunable Optical IRS-aided Cell-Free VLC Networks
-
摘要: 该文研究了一种基于新型光学可调制智能超表面(IRS)辅助的无蜂窝可见光通信(VLC)网络接入方案,其中IRS可以为收发端提供额外的反射信道,也可以利用反射系数可调制的特性,直接为网络用户提供无线接入。该文建立了可调制IRS辅助的无蜂窝VLC接入网络的系统模型,推导了网络吞吐量与发光二极管(LED)照明通信设备的工作模式、IRS的工作模式和用户接入关联之间的关系,并提出以最大化网络吞吐量为目标的接入优化问题。该优化问题分两步求解:(1) 当调制模式的LED数和调制模式的IRS数给定时,基于深度确定性策略梯度(DDPG)的深度强化学习(DRL)算法可以得到最优的接入点工作模式和用户接入关联策略;(2) 遍历可能的调制LED数和调制IRS元件数即可得到优化问题的解。仿真结果表明,联合优化接入点的工作模式和用户接入关联矩阵可以提高IRS辅助无蜂窝VLC网络吞吐量。Abstract:
Objective Visible Light Communication (VLC) is emerging as a key technology for future communication systems, offering advantages such as abundant and license-free spectrum, immunity to electromagnetic interference, and low-cost front-end devices. Light Emitting Diodes (LEDs) serve a dual purpose, providing both communication and illumination in indoor environments. However, VLC links are vulnerable, as the interruption of the Line of Sight (LoS) can disrupt communication. The Optical Intelligent Reconfigurable Surface (IRS) has been proposed to enhance communication performance and robustness by reconfiguring optical channels. Two main types of optical IRS materials, mirror-based and meta-surface-based, are commonly used. Mirror-based IRS units introduce additional Non-LoS (NLoS) links with constant reflectance.A cell-free VLC network with the assistance of a newly proposed tunable IRS is proposed and fully investigated. The reflectance of the optical IRS can be dynamically adjusted, allowing it to function as a transmitter by modulating signals on the reflectance with stable incident light. In this system, at least one LED must operate in illumination mode to emit light with constant intensity when any IRS unit is in modulation mode. The IRS can also function in reflection mode to provide additional reflective links, enhancing signal strength. The tunable IRS increases the number of Access Points (APs), enabling ultra-dense VLC networks that significantly improve throughput and spectral efficiency. The system model for a tunable IRS-assisted cell-free VLC network is derived, and the channel gain is calculated using the Lambertian model. The transmission rate for each user is determined by the work mode of the APs and the IRS’s association with the LEDs and users, represented by binary variables. The primary objective of this study is to maximize the total throughput of the IRS-aided VLC network. Methods An optimization problem is formulated to maximize network throughput by jointly optimizing the work mode of the LEDs and IRS units, along with user-IRS associations. Given the non-convex nature of this integer optimization problem, it is decomposed into two sub-problems. (1) Problem P2: With fixed numbers of LEDs and IRS units in modulation mode, a Deep Deterministic Policy Gradient (DDPG)-based Deep Reinforcement Learning (DRL) algorithm is applied to optimize the work mode of each AP and the user-AP associations. The binary variables are relaxed to continuous values in the range [0,1]. The optimization problem is modeled as a Markov Decision Process (MDP), where the state corresponds to the channel gains, the action represents the optimization variables, and the reward is the network throughput. To ensure convergence, the reward is adjusted to reflect the negative of any unsatisfied constraints, and the noise in the DDPG model is dynamically modeled using two random variables. (2) Problem P1: The optimization problem is then solved by considering all possible combinations of the number of LEDs and IRS units in modulation mode. Results and Discussions Simulations for the indoor tunable IRS-aided system are performed using Python with PyTorch. The simulation parameters for the indoor scenario and the neural network configurations in the DDPG algorithm are shown ( Table 1 ,Table 2 ), respectively. The results demonstrate the following: (1) The convergence and final reward of the modified DDPG algorithm (denoted as DDPG-O) are compared with the unmodified version (denoted as DDPG-N) in solving Problem P2 (Fig. 4 ). The results show that the modified DDPG algorithm converges efficiently and achieves an access and association policy that maximizes network throughput. (2) The maximized throughput for various numbers of LEDs in modulation mode, along with varying optical power, is presented when solving Problem P1 (Fig. 5 ). It is observed that the policy with one lighting LED achieves the maximum throughput with appropriate IRS units in modulation mode. (3) The relationship between maximized throughput and the number of IRS units is analyzed in (Fig. 6 ). The total throughput increases as the number of IRS units grows, although the increase is not linear. (4) Simulations with the same number of users and LEDs are also considered (Fig. 7 ). It is observed that the total network throughput with and without IRS APs is nearly identical when the number of users does not exceed the number of LEDs. Thus, the VLC network benefits more when the number of users exceeds the number of LEDs.Conclusions A tunable IRS-assisted cell-free VLC network has been proposed, where IRS units either operate in reflection mode to provide additional NLoS channels or in modulation mode to enable wireless access for users. The channel and transmission models are developed, and an optimization problem is formulated to jointly select the working mode of APs and user associations with the objective of maximizing network throughput. A modified DDPG algorithm is applied to solve for the optimal policy. The optimization problem is further tackled by exploring all possible combinations of modulating LEDs and IRS units. Simulation results verify the effectiveness of the proposed algorithm, showing that the network throughput can be significantly improved by incorporating IRS APs, particularly when the number of users is large. -
1 DDPG-O:基于DRL的可调制IRS辅助无蜂窝VLC网络接入参数优化
1. 初始化:Actor网络、Critic网络、target-Actor网络、target-
Critic网络的参数和梯度初始输入:状态${s_0}$、用于调制的LED数目$ M' $、用于反射的
IRS数目$K'$输出:系统用户的和速率及对应的最优策略$\{ L,I,G,F\} $ 2. For episode$ \in $episodes do: 3. 从Replay Buffer中随机抽取初始状态${s_t}$,若Replay
Buffer未准备好则采用${s_0}$;初始化set=0;4. For t$ \in $Max steps do: 5. 根据当前的状态${s_t}$,Actor网络基于当前的策略
$\pi ({s_t},{a_t})$输出动作${a_t}$6. if set < 0: 选择高斯噪声${N_1}(0,\sigma _1^2)$,与动作${a_t}$叠加
${a_t}^\prime = {a_t} + {N_1}$else: 选择高斯噪声${N_2}(0,\sigma _2^2)$,与动作${a_t}$叠加
${a_t}^\prime = {a_t} + {N_2}$7. 根据动作${a_t}^\prime $,与环境交互,获得奖励${r_t}$、下一时刻状态
${s_{t + 1}}$8. if ${r_t}$< 0,以概率$\varsigma $将其储存到Replay Buffer;else 直接存
入Replay Buffer9. 若Replay Buffer准备好,抽取batch size个元组
$({s_t},{a_t},{s_{t + 1}},{r_t})$使智能体进行学习,通过梯度反向传播
更新Actor网络和Critic网络的参数;若未准备好则只存储
本次获得的元组$({s_t},{a_t},{s_{t + 1}},{r_t})$。10. 软更新target-Actor网络参数、target-Critic网络的参数 11. 计算近$\eta $次与环境交互获得的奖励$\bar r$,$set = \bar r$ 12. ${s_t} = {s_{t + 1}}$ 13. end for 14. end for 表 2 系统模型仿真参数列表
参数 值 参数 值 LED个数 $ M = 4 $ 朗伯系数 $m = 1$ IRS个数 $ K = 16 $ PD视场角 ${\text{FoV}} = {70^ \circ }$ PD个数 $ N = 5 $ 增益函数 $g = 1$ 调制IRS个数 $ 0 \le M' \le M $ 内部反射常数 ${n_r} = 1.5$ 调制IRS个数 $ 0 \le K' \le K $ 频带宽度 $W = 2 \times {10^8}\;{\text{Hz}}$ PD面积 $ 1\;{\text{c}}{{\text{m}}^2} $ 调光功率 $A = 5$ 最大反射系数 $\alpha = 0.9$ 调光系数 $\xi = 0.5$ 噪声功率 ${\sigma ^2} = 1 \times {10^{ - 21}}$ PD响应率 $\rho = 0.5$ 表 3 DDPG-O算法参数设置
参数 值 参数 值 BufferSize 100 000 噪声系数1 ${\sigma _1} = 0.15$ Batchsize $ B = 32 $ 噪声系数2 ${\sigma _2} = 0.08$ 隐藏层神经元数目1 $ {H_1} = 880 $ 价值衰减常数 $\gamma = 0.98$ 隐藏层神经元数目2 $ {H_2} = 600 $ 策略网络学习率 ${l_{Policy}} = 1 \times {10^{ - 3}}$ 策略网络深度 $ {D_P} = 1 $ 价值网络学习率 ${l_{Critic}} = 1 \times {10^{ - 2}}$ 值网络深度 ${D_C} = 2$ 软更新常数 $\tau = 0.000\;01$ 丢弃率 $\zeta = 0.85$ 仿真周期 $E = 1\;000$ 噪声切换长度 $\eta = 6$ 最大步数 $s = 100$ -
[1] LIU Guangyi, HUANG Yuhong, LI Na, et al. Vision, requirements and network architecture of 6G mobile network beyond 2030[J]. China Communications, 2020, 17(9): 92–104. doi: 10.23919/JCC.2020.09.008. [2] SUN Shiyuan, YANG Fang, SONG Jian, et al. Intelligent reflecting surface for MIMO VLC: Joint design of surface configuration and transceiver signal processing[J]. IEEE Transactions on Wireless Communications, 2023, 22(9): 5785–5799. doi: 10.1109/TWC.2023.3236811. [3] ABUMARSHOUD H, MOHJAZI L, DOBRE O A, et al. LiFi through reconfigurable intelligent surfaces: A new frontier for 6G?[J]. IEEE Vehicular Technology Magazine, 2022, 17(1): 37–46. doi: 10.1109/MVT.2021.3121647. [4] 张在琛, 江浩. 智能超表面使能无人机高能效通信信道建模与传输机理分析[J]. 电子学报, 2023, 51(10): 2623–2634. doi: 10.12263/DZXB.20221352.ZHANG Zaichen and JIANG Hao. Channel modeling and characteristics analysis for high energy-efficient RIS-assisted UAV communications[J]. Acta Electronica Sinica, 2023, 51(10): 2623–2634. doi: 10.12263/DZXB.20221352. [5] QIAN Lei, CHI Xuefen, ZHAO Linlin, et al. Secure visible light communications via intelligent reflecting surface[C]. Proceedings of 2021 IEEE International Conference on Communications, Montreal, Canada, 2021: 1–6. doi: 10.1109/ICC42927.2021.9500409. [6] QIAN Lei, ZHAO Linlin, HUANG Nuo, et al. Security enhancement by intelligent reflecting surfaces for visible light communications[J]. Optics Communications, 2024, 570: 130851. doi: 10.1016/j.optcom.2024.130851. [7] ABDELHADY A M, SALEM A K S, AMIN O, et al. Visible light communications via intelligent reflecting surfaces: Metasurfaces vs mirror arrays[J]. IEEE Open Journal of the Communications Society, 2021, 2: 1–20. doi: 10.1109/OJCOMS.2020.3041930. [8] SUN Shiyuan, YANG Fang, SONG Jian, et al. Joint resource management for intelligent reflecting surface–aided visible light communications[J]. IEEE Transactions on Wireless Communications, 2022, 21(8): 6508–6522. doi: 10.1109/TWC.2022.3150021. [9] HAMMADI A A, BARIAH L, MUHAIDAT S, et al. Deep Q-learning-based resource management in IRS-assisted VLC systems[J]. IEEE Transactions on Machine Learning in Communications and Networking, 2024, 2: 34–48. doi: 10.1109/TMLCN.2023.3328501. [10] ULLAH N, ZHAO Ruizhe, and HUANG Lingling. Recent advancement in optical metasurface: Fundament to application[J]. Micromachines, 2022, 13(7): 1025. doi: 10.3390/mi13071025. [11] HE Tao, LIU Tong, XIAO Shiyi, et al. Perfect anomalous reflectors at optical frequencies[J]. Science Advances, 2022, 8(9): eabk3381. doi: 10.1126/sciadv.abk3381. [12] BHOWMIK T, CHOWDHARY A K, and SIKDAR D. Polarization- and angle-insensitive tunable metasurface for electro-optic modulation[J]. IEEE Photonics Technology Letters, 2023, 35(16): 879–882. doi: 10.1109/LPT.2023.3256584. [13] JIA Linqiong, WANG Qikai, and ZHANG Yijin. Joint constellation and reflectance optimization for tunable intelligent reflecting surface-aided VLC systems[J]. Photonics, 2024, 11(9): 840. doi: 10.3390/photonics11090840. [14] LI Qian, SHANG Tao, TANG Tang, et al. Adaptive user association scheme for indoor multi-user NOMA-VLC systems[J]. IEEE Wireless Communications Letters, 2023, 12(5): 873–877. doi: 10.1109/LWC.2023.3247420. [15] 尤肖虎, 王东明, 王江舟. 分布式MIMO与无蜂窝移动通信[M]. 北京: 科学出版社, 2019: 12.YOU Xiaohu, WANG Dongming, and WANG Jiangzhou. Distributed MIMO and Cell-Free Mobile Communication[M]. Beijing: Science Press, 2019: 12. [16] 朱秋明, 倪浩然, 华博宇, 等. 无人机毫米波信道测量与建模研究综述[J]. 移动通信, 2022, 46(12): 1–11. doi: 10.3969/j.issn.1006-1010.20221114-0001.ZHU Qiuming, NI Haoran, HUA Boyu, et al. A survey of UAV millimeter-wave channel measurement and modeling[J]. Mobile Communications, 2022, 46(12): 1–11. doi: 10.3969/j.issn.1006-1010.20221114-0001. [17] SHEHAB M, CIFTLER B S, KHATTAB T, et al. Deep reinforcement learning powered IRS-assisted downlink NOMA[J]. IEEE Open Journal of the Communications Society, 2022, 3: 729–739. doi: 10.1109/OJCOMS.2022.3165590. [18] JIA Linqiong, SHU Feng, HUANG Nuo, et al. Capacity and optimum signal constellations for VLC systems[J]. Journal of Lightwave Technology, 2020, 38(8): 2180–2189. doi: 10.1109/JLT.2020.2971273. [19] WANG Junbo, HU Qingsong, WANG Jiangzhou, et al. Tight bounds on channel capacity for dimmable visible light communications[J]. Journal of Lightwave Technology, 2013, 31(23): 3771–3779. doi: 10.1109/JLT.2013.2286088. [20] HORNIK K, STINCHCOMBE M, and WHITE H. Multilayer feedforward networks are universal approximators[J]. Neural Networks, 1989, 2(5): 359–366. doi: 10.1016/0893-6080(89)90020-8.