Joint Task Allocation, Communication Base Station Association and Flight Strategy Optimization Design for Distributed Sensing Unmanned Aerial Vehicles
-
摘要: 针对多无人机(UAV)分布式感知开展研究,为协调各UAV行为,该文设计了任务感知-数据回传协议,并建立了UAV任务分配、数据回传基站关联与飞行策略联合优化混合整数非线性规划问题模型。鉴于该问题数学结构的复杂性,以及集中式优化算法设计面临计算复杂度高且信息交互开销大等不足,提出将该问题转化为协作式马尔可夫博弈(MG),定义了基于成本-效用复合的收益函数。考虑到MG问题连续-离散动作空间复杂耦合特点,设计了基于独立学习者(IL)的复合动作表演评论家(MA-IL-CA2C)的MG问题求解算法。仿真分析结果表明,相对于基线算法,所提算法能显著提高系统收益并降低网络能耗。Abstract:
Objective The demand for Unmanned Aerial Vehicles (UAVs) in distributed sensing applications has increased significantly due to their low cost, flexibility, mobility, and ease of deployment. In these applications, the coordination of multi-UAV sensing tasks, communication strategies, and flight trajectory optimization presents a significant challenge. Although there have been preliminary studies on the joint optimization of UAV communication strategies and flight trajectories, most existing work overlooks the impact of the randomly distributed and dynamically updated task airspace model on the optimal design of UAV communication and flight strategies. Furthermore, accurate UAV energy consumption modeling is often lacking when establishing system design goals. Energy consumption during flight, sensing, and data transmission is a critical issue, especially given the UAV’s limited payload capacity and energy supply. Achieving an accurate energy consumption model is essential for extending UAV operational time. To address the requirements of multiple UAVs performing distributed sensing, particularly when tasks are dynamically updated and data must be transmitted to ground base stations, this paper explores the optimal design of joint UAV sensing task allocation, base station association for data backhaul, flight strategy planning, and transmit power control. Methods To coordinate the relationships among UAVs, base stations, and sensing tasks, a protocol framework for multi-UAV distributed task sensing applications is first proposed. This framework divides the UAVs’ behavior during distributed sensing into four stages: cooperation, movement, sensing, and transmission. The framework ensures coordination in the UAVs’ movement to the task area, task sensing, and the backhaul transmission of sensed data. A sensing task model based on dynamic updates, a UAV movement model, a UAV sensing behavior model, and a data backhaul transmission model are then established. A revenue function, combining task sensing utility and task execution costs, is designed, leading to a joint optimization problem of UAV task allocation, communication base station association, and flight strategy. The objective is to maximize the long-term weighted utility-cost. Given that the optimization problem involves high-dimensional decision variables in both discrete and continuous forms, and the objective function is non-convex with respect to these variables, the problem is a typical non-convex Mixed-Integer Non-Linear Programming (MINLP) problem. It falls within the NP-Hard complexity class. Centralized optimization algorithms for this formulation require a central node with high computational capacity and the collection of substantial additional information, such as channel state and UAV location data. This results in high information-interaction overhead and poor scalability. To overcome these challenges, the problem is reformulated as a Markov Game (MG). An effective algorithm is designed by leveraging the distributed coordination concept of Multi-Agent (MA) systems and the exploration capability of deep Reinforcement Learning (RL) within the optimization solution space. Specifically, due to the complex coupling between the continuous and discrete action spaces in the MG problem, a novel solution algorithm called Multi-Agent Independent-Learning Compound-Action Actor-Critic (MA-IL-CA2C) based on Independent Learning (IL) is proposed. The core idea is as follows: first, the independent-learning algorithm is applied to extend single-agent RL to a MA environment. Then, deep learning is used to represent the high-dimensional action and state spaces. To handle the combined discrete and continuous action spaces, the UAV action space is decomposed into discrete and continuous components, with the DQN algorithm applied to the discrete space and the DDPG algorithm to the continuous space. Results and Discussions The computational complexity of action selection and training for the proposed MA-IL-CA2C algorithm is theoretically analyzed. The results show that its complexity is almost equivalent to that of the two benchmark algorithms, DQN and DDPG. Additionally, the performance of the proposed algorithm is simulated and analyzed. When compared with the DQN, DDPG, and Greedy algorithms, the MA-IL-CA2C algorithm demonstrates lower network energy consumption throughout the network operation ( Fig. 6 ), improved system revenue (Fig. 5 ,Fig. 8 , andFig. 9 ), and optimized UAV flight strategies (Fig. 7 ).Conclusions This paper addresses and solves the optimal design problems of joint UAV sensing task allocation, data backhaul base station association, flight strategy planning, and transmit power control for multi-UAV distributed task sensing. A new MA-IL-CA2C algorithm based on IL is proposed. The simulation results show that the proposed algorithm achieves better system revenue while minimizing UAV energy consumption. -
1. 引言
物联网(Internet of Things, IoT)通过网络技术和各类信息传感器,按照协定方式将能够被独立标识的机器、人和物等按需求连接起来,进行信息传输和协同交互,实现“万物互联”[1]。传感器作为物联网系统的神经末梢,获取所需外界数据信息,是将数字世界和物理世界连接的核心。随着物联网应用范围的拓展,传感器级的安全问题越来越受到重视。一方面,传感器节点通常部署在无人值守的地理环境,只有很少或没有安全保护[2],存在信息泄露的安全隐患。另一方面,传感器附件配备可用内存较少,计算能力有限[3],一个典型的传感器附件可能只备用512 B的内存,诸如高级加密标准(Advanced Encryption Standard, AES)之类的传统加密技术无法使用。如何以较小的计算和存储开销,实现传感器产生数据的安全、可信传递,已经成为物联网安全的迫切需求。
物理不可克隆函数(Physical Unclonable Function, PUF)发生器可利用物理结构的随机、固有属性为物联网安全提供低开销、高可靠性的解决方案。Pappu等人[4]依据光学操作原理实现物理单向函数以产生能够满足唯一标识功能的激励响应对(Challenge Response Pairs, CRP),最早提出PUF的概念。随之发展到目前以硅基为主的PUF发生器,它利用硅基电路制造过程存在的微小工艺偏差产生具有唯一性、随机性和不可克隆性的硬件指纹,包括仲裁器PUF[5]、环形振荡器PUF[6]、静态随机存取存储器PUF(Static Random Access Memory PUF, SRAM PUF)[7,8]和触发器PUF[9]等。然而将这些硅基PUF直接集成到资源受限的传感器节点将会增加设计难度和额外成本[10]。学者已着手探索利用已有的传感器组件实现更低成本PUF发生器的研究[11]。Rosenfeld等人[12]利用传感器半透明底层所涂深色物质不均匀、光学透射率不一致,由此导致每个芯片光电二极管的光学灵敏度不尽相同等特点,提出一种消除传感器与加密技术分离的架构,但是该架构仍需利用传统的PUF将输入激励转换为初始向量以进行下一步工作,增加了额外的电路开销。Dey等人[13]证明加速度计具有独特的指纹,对80个独立的加速度计芯片以及25个Android手机和2个平板电脑内部加速度计的测试数据表明这些指纹的确存在。Aysu等人[14]利用陀螺仪的输出构建不可预测的PUF响应,但存在无法重复生成所需激励响应对的问题。Labrado等人[15]对压电传感器建模,分析得到交流电压相同,由于制造过程的偏差,不同传感器的等效阻抗存在差异,可利用此差异产生所需PUF数据,但该设计需要外加交流电压源。
气敏传感技术的发展推动了物联网在气体(油)等管理领域的广泛应用。通常将类型和数量众多的气敏传感器集成于物联网系统,安置在液化气储藏库、暖通市场、交通运输等场所。由于目前众多气敏传感器之间没有采取有效技术将其特征区分,不具备“一物一密”特性,所以当某观察点发生异常时难以精准判断哪个气敏传感器“报警”,也就是说难以找到气体(油)的泄露源头,耽误最佳抢修时间;同时也存在传输数据被盗或篡改数据等恶意破坏事故。鉴此,本文针对物联网智能体系中感知节点的安全性难题,拟利用气敏传感器识别到的外界环境变化,从中提取物理特征信息,设计高稳态PUF发生器,为传感器提供定位标签,实现物联网系统自底向上的安全防护。
2. 半导体气敏传感器偏差分析
气敏传感器能将检测到的气体成分和浓度大小转换为电信号,但由于其制造过程的随机工艺偏差使得输出电信号值偏离理论值,因此可以利用气敏传感器制备工艺偏差构建气敏传感PUF发生器。气敏传感器检测系统主要由3部分组成:气体成分特异性识别和浓度识别的半导体气敏材料、非电信号转换为电信号的敏感组件和记录信号的辅助仪器,结构如图1所示。半导体气敏材料为纳米材料,具有高比表面积,气体与材料的接触面积大,为气体分子吸附提供更多的活性位点,有助于气敏性能的提高。
静电喷雾沉积(Electrostatic Spray Deposition, ESD)是制备纳米材料的常用方法[16]。泰勒锥的形成和射流鞭动是两个随机性很强的静电喷雾关键环节。泰勒锥表面是液相和气相的边界,环境气体的快速吸附和溶剂蒸发两个过程并存,因此,不可避免地会引起泰勒锥内部扰乱,影响泰勒锥的形状[17]。射流鞭动是静电喷雾时聚合物喷射流的运动轨迹,呈复杂的3维非直线型“鞭动”。刚开始喷的丝是直的,随后发生弯曲、不稳定。Shin等人[18]针对射流的不稳定阶段提出鞭动模型,用线性不稳定性分析射流鞭动的发生。在已知流体性质和工艺参数条件下,给出式(1)扰动系数的数学方程
Γ(E∞,Q)=ln[A(s)A(0)]=S∫0ω(h,E,σ)Qπh2dh (1) 其中,Γ(E∞, Q)为不稳定放大因子,A(s)为振幅,s为向下移动距离,ω为增长速率,h为射流半径,Q为流率,σ为射流表面电荷密度。
纳米材料中纤维表面对气体分子的吸附能力赋予每个传感器独有的气敏特性。由于与喷雾效果相关的液体黏度、湿度、温度等参数在实验过程中不可避免地发生变化,导致射流振动不完全可控,致使纳米纤维的直径大小以及取向的排列发生差异。聚合物溶液通过电场到达收集板,从喷射口喷射出无数方向不定、粗细不一的纤维层层堆叠,导致每块纤维区域密度都具有随机性和唯一性。因此,每块区域吸附气体能力不同,可利用此特征得到随机的、不可克隆的气敏传感PUF发生器。
3. 气敏传感PUF发生器设计
3.1 半导体气敏传感器的制备
采用ESD技术制备纳米材料,装置如图2所示,其具体制备过程如下:将Pd(NO3)2·2H2O (5 mg), SnCl4·5H2O (701 mg)和聚乙烯吡咯烷酮 (1200 mg)溶解于二甲基甲酰胺 (5 ml)和乙醇 (5 ml)混合溶液中,在室温下搅拌6 h,使溶液均匀;将其灌入注射器,固定在推进泵上;将高压电源的阳极连接注射器喷嘴,阴极连接接地的收集板,阴阳极距离保持15 cm;在16 kV高压电场作用下,注射器喷头开始喷射纤维材料;将纤维材料放入马弗炉,在空气氛围中以1 ℃/min的加热速率至600 ℃,并保持2 h,然后冷却。
传感器一般分为内热式和旁热式两种结构。旁热式气敏器件的氧化铝陶瓷管内放置高阻加热丝,陶瓷管外涂梳状金电极,再在金电极外涂气敏半导体材料。这种结构克服了内热式器件热容量小、易受环境气流影响和测量不稳定的缺点,明显提高器件稳定性。具体制作过程如下:将制备好的纳米材料和去离子水按一定比例混合成糊状物;将糊状物涂到两端固定铂金线的氧化铝管上,也同样在空气氛围中以1 ℃/min加热速率至600 ℃,保持2 h;待冷却后取出。在氧化铝陶瓷管轴心穿入一根直径为0.05 mm,长度为10.5 mm细合金加热丝,用于加热;将加热丝和用作测量电极的铂金线焊在传感器基座上,即完成旁热式气敏器件制备。传感器如图3所示。为使传感器具有更好的稳定性和重复性,还需将其在300 ℃环境温度下老化2~7 d。
3.2 传感器响应提取
为完成传感器响应提取,搭建如图4所示气敏传感PUF发生器测试平台,由测试腔、气敏单元、蒸发台、推进泵、流量计等组成。在测试过程中,开启阀门1~4;将安装有8个气敏传感器(作为1个阵列)的电路放入测试腔,并封闭;为气敏单元电热丝提供4.5 V直流电源。用数据采集仪(如自带信号采集软件的安捷伦34970 A)观测时间-电阻曲线,待曲线平稳,即传感器已经进入准稳态;用微量进样器抽取一定量目标物的液体,通过推进泵将液体注射到蒸发台。由于蒸发台温度达100 ℃,液体将快速气化,并均匀扩散。同时,数据采集仪实时记录8个气敏传感器电阻值变化。待传感器电阻值稳定,推进泵停止推进,使空气重新充满测试腔。如此交替,可得到传感器阵列对一定浓度目标气体响应性能。
3.3 高稳态PUF发生器数据生成
由于气敏传感器制备工艺的随机偏差,采集到的电阻值具有唯一性。将上述数据利用随机阻值多位平衡算法[15]进行处理。其过程如下:将3个传感器组合为1个传感器簇,比较两个传感器簇总和电阻的大小,结果用1位二进制数表示。如果第1个簇的阻值和较大,则结果为1,反之为0。从8个传感器中选出3个作为1簇,共有
C38 ,即56种选择;再从中任选两簇进行阻值和比较运算,则产生C256 ,即128种可能性。簇选择和比较运算将不同传感器测量值随机比较,进而得到128位PUF响应。该响应不会偏向任意某个传感器,从而达到平衡。具体数据生成方法使用8位随机阻值平衡算法,调用该算法16次生成128位响应,算法的伪代码如表1所示。算法假定8位子集包含在数组bit中,数组v包含8个气敏传感器关联的电阻值,传感器0的值位于数组v位置0,传感器1的值位于数组v位置1,依此类推。数组lef和r表示被选择的两簇传感器位置,取出数组v中与位置对应的电阻值,作阻值和比较运算。之后,place的值增加1,也就是将被选择的每个传感器位置都加1,以确定在随后的比较中使用哪几个传感器。进行8次比较后,一个过程完成,生成8个响应位。之后,改变lef与r的初始值以生成后续8位子集。
表 1 8位随机阻值平衡算法伪代码(1) int bit[place] (2) int lef[3] (3) int r[3] (4) double v[8] (5) i=0 (6) do {lsum=v[(i+lef[0])mod 8]+v[(i+lef[1])mod 8]+v[(i+lef[2]) mod 8] (7) rsum=v[(i+r[0]) mod 8]+v[(i+r[1]) mod 8]+v[(i+r[2])mod 8] (8) if lsum>rsum (9) then bits[palce]=1 (10) else bits[place]=0 (11) place=place+1} (12) while(i<8) (13) return 4. 实验结果分析
主要包括3部分:(1)材料表征;(2)偏差特性分析;(3)PUF发生器性能分析。用扫描电子显微镜(Scanning Electron Microscope, SEM)对材料形貌进行表征,观察纳米材料外观结构和形貌;用X射线衍射分析仪(X-Ray Diffraction, XRD)测定并分析产物物相结构;搭建测试平台提取传感器偏差。采用唯一性、可靠性和随机性3种常用性能指标评估输出响应。
4.1 材料表征
SEM表征纳米材料为颗粒状氧化物形貌,如图5所示。纳米颗粒的大小没有规律地散落分布,局部形成团聚,分布具有高度随机性。在不同分辨率情况下都可以观察到氧化物颗粒形态各异,每一个区域与气体分子的接触范围都不相同,是传感器PUF发生器差异性的原因所在。
图6为Pd-SnO2纳米材料样品经600 ℃高温烧结后的XRD图谱。可以发现,(110), (101), (200), (211), (220), (310), (321)等的衍射峰,与SnO2的JCPDS(Joint Committee on Powder Diffraction Standards)标准卡片(PDF#77-0447)峰型吻合。Pd的掺杂并没有改变SnO2的晶体结构,没有出现PdO的特征峰,其原因在于制备的纳米材料中PdO占比很小。
4.2 偏差特性分析
偏差特性表示不同传感器对同一气体激励源的偏差情况。用安捷伦多路数据采集仪实时记录原始电信号,反映传感器电阻随测试腔中甲醛气体浓度变化而变化的信息。传感器的响应定义为[19]
R=Ra/Rg (2) 其中,Ra和Rg为气敏传感器在空气和目标气体的电阻值。利用式(2)计算传感器响应随时间变化的趋势,如图7所示。在200 ppm甲醛气体浓度下,对所制备的Pd-SnO2气敏传感器进行两次循环。可以发现,在响应曲线较为平稳阶段,传感器A和传感器B表现出偏差特性,RA和RB变化趋势一致,且RA>RB。有效降低阻值比较时响应翻转现象出现的可能性,提高输出响应的稳定性。
4.3 PUF发生器性能分析
4.3.1 随机性
随机性表示PUF发生器输出数据中逻辑0和逻辑1分布情况。在理想情况下,逻辑0和逻辑1的概率应相同,即随机性为100%。按式(3)计算[20]
Randomness=(1−|1−2P(r=1)|)×100% (3) 其中,r为输出响应,P为输出响应中1的概率。实验制备了50组气敏传感PUF发生器样本,测试了6400位二进制响应。其中,“0”的数量为3138位,“1”的数量为3262位,则得该PUF发生器的随机性为98.06%,如图8所示。
4.3.2 唯一性
唯一性表示多个设备对同一激励的响应区分度,通过片间汉明距离(Hamming Distance, HD)计算。在理想情况下,唯一性接近50%。按式(4)计算[20]
Uniqueness=2k(k−1)k−1∑i=1k∑j=i+1HD(Ri,Rj)n×100% (4) 其中,k为PUF发生器的数量,Ri和Rj分别为第i和第j个PUF发生器的输出响应,HD(Ri, Rj)为输出响应的汉明距离,则得该PUF发生器的唯一性为49.04%,接近理想值50%,如图9所示。
4.3.3 可靠性
可靠性表示PUF发生器在给定输入激励始终产生正确响应的可能性。理想情况下,可靠性为100%,这意味着PUF发生器将始终产生正确的响应。按式(5)计算n位响应的可靠性[20]
Realiablty=100%−1mm∑t=1HD(Ru,Rv)n×100% (5) 其中,m为同一激励下测量的次数,n为响应数据的位数,Ru为选取的参考响应数据,Rv为第v次测量的响应数据。在4.2 ~4.9 V电压范围内,以0.1 V为增量,以4.6 V为参考点,选取5个PUF发生器样本进行测试,统计结果如图10所示。可以发现,随着电压偏离参考点,其可靠性出现下降的总体趋势,具备PUF的典型特征。
可靠性还反映在一定时间内响应的变化情况。在常温常压下,选取5个PUF发生器样本进行400 s连续测试。用每个PUF发生器的第1个响应作参考响应,统计结果如图11所示。可以发现,前90 s的可靠性保持在100%,之后的310 s,也保持95%以上。
表2为与其他类型PUF发生器之间的性能比较。所设计的PUF发生器随机性为98.06%,可靠性为97.85%,唯一性为49.04%。
5. 结论
本文所提基于气敏传感器的高稳态物理不可克隆函数发生器设计方案将感知器件和PUF发生器有效结合起来,通过分析传感器制造过程的随机偏差,测试多组外部激励下的传感器响应,利用随机阻值多位平衡算法生成高稳态PUF发生器数据,依靠传感器组件而不需要设计专用PUF电路模块,有效减少资源的开销。实验测试结果表明,基于气敏传感器的PUF发生器,在可靠性、随机性和唯一性上均有较理想的特性,可为解决极端资源受限系统的高安全性问题提供解决途径,为物联网的发展提供安全保障。
-
1 MA-IL-CA2C算法
(1)初始化:设置t=0,最大决策周期数T,选择经验回放模块
容量Nc,批量大小Nb,网络学习率αθtn和αωtn,软更新参数
ρ;(2)对于每个智能体n∈N: 随机初始化网络参数θtn, ˆθtn, ωtn, ˆωtn,并设置初始状态s0; #主循环 (3)如果t≤T: (a)对于每个智能体n∈N: 根据式(28),在stn处选择离散动作adis,tn,即选择感知任务m和BSk; #协作阶段 在控制信道上反馈决策Dcn={n,adis,tn},并接收其余
UAV的决策信息;根据离散动作adis,tn决定连续动作acon,tn=vtn(st,adis,tn),
即决定飞行方向角δtn、移动速度vtn和发射功率Ptn;#移动阶段 基于飞行方向角δtn和移动速度vtn,飞行至感知位置xs,tn; #感知阶段 执行感知任务并收集任务数据Ds,tn; #传输阶段 以发射功率Ptn将任务数据回传给BSk; 根据式(23)获得收益rt+1n,观察得到st+1; 将经验元组(st,atn,rt+1n,st+1)存入经验回放模块Dn中; 如果t>Nc: 从经验回放模块Dn中移除旧的经验元组; #训练网络 在经验回放模块Dn中随机抽取一个批量Nb的经验元组
(st,atn,rt+1n,st+1);根据式(29)–式(34),更新当前网络参数θtn与ωtn; 根据式(36)和式(37),更新目标网络参数ˆθtn与ˆωtn; (b)令t=t+1, st←st+1; (4)重复步骤(3),直至算法结束。 表 1 仿真参数
参数 数值 UAV数目N,感知任务数目M,BS数目K 3, 10, 2 网络范围半径rc 500 m 信道带宽W 1 MHz BS高度H0 25 m UAV最大与最低高度hmin,hmax 50 m, 100 m UAV最大飞行速度vmax 15 m/s UAV最大发射功率Pmax 30 dBm 感知参数λ 0.01 环境参数a,b 9.61, 0.16 LoS和NLoS额外路径损耗ηLoS,ηNLoS 1dB, 20 dB 载波频率fc 2 GHz 噪声功率N0 –96 dBm 表 2 模型超参数
超参数 数值 Actor网络与Critic网络初始学习率αθtn,αωtn 0.001, 0.002 软更新权重ρ 0.01 贪婪率ε 0.1 激活函数 ReLu 批量大小Nb 64 经验回放模块大小Nc 20 000 DQN网络初始学习率 0.01 DQN目标网络更新周期 100 Actor网络和Critic网络层数 4,4 隐层神经元数 128 -
[1] SHRESTHA R, ROMERO D, and CHEPURI S P. Spectrum surveying: Active radio map estimation with autonomous UAVs[J]. IEEE Transactions on Wireless Communications, 2023, 22(1): 627–641. doi: 10.1109/TWC.2022.3197087. [2] NOMIKOS N, GKONIS P K, BITHAS P S, et al. A survey on UAV-aided maritime communications: Deployment considerations, applications, and future challenges[J]. IEEE Open Journal of the Communications Society, 2023, 4: 56–78. doi: 10.1109/OJCOMS.2022.3225590. [3] HARIKUMAR K, SENTHILNATH J, and SUNDARAM S. Multi-UAV oxyrrhis marina-inspired search and dynamic formation control for forest firefighting[J]. IEEE Transactions on Automation Science and Engineering, 2019, 16(2): 863–873. doi: 10.1109/TASE.2018.2867614. [4] QU Yuben, SUN Hao, DONG Chao, et al. Elastic collaborative edge intelligence for UAV Swarm: Architecture, challenges, and opportunities[J]. IEEE Communications Magazine, 2024, 62(1): 62–68. doi: 10.1109/MCOM.002.2300129. [5] ZHANG Tao, ZHU Kun, ZHENG Shaoqiu, et al. Trajectory design and power control for joint radar and communication enabled multi-UAV cooperative detection systems[J]. IEEE Transactions on Communications, 2023, 71(1): 158–172. doi: 10.1109/TCOMM.2022.3224751. [6] PAN Hongyang, LIU Yanheng, SUN Geng, et al. Joint power and 3D trajectory optimization for UAV-Enabled wireless powered communication networks with obstacles[J]. IEEE Transactions on Communications, 2023, 71(4): 2364–2380. doi: 10.1109/TCOMM.2023.3240697. [7] NGUYEN P X, NGUYEN V D, NGUYEN H V, et al. UAV-assisted secure communications in terrestrial cognitive radio networks: Joint power control and 3D trajectory optimization[J]. IEEE Transactions on Vehicular Technology, 2021, 70(4): 3298–3313. doi: 10.1109/TVT.2021.3062283. [8] ZENG Shuhao, ZHANG Hongliang, DI Boya, et al. Trajectory optimization and resource allocation for OFDMA UAV relay networks[J]. IEEE Transactions on Wireless Communications, 2021, 20(10): 6634–6647. doi: 10.1109/TWC.2021.3075594. [9] LI Peiming and XU Jie. Fundamental rate limits of UAV-enabled multiple access channel with trajectory optimization[J]. IEEE Transactions on Wireless Communications, 2020, 19(1): 458–474. doi: 10.1109/TWC.2019.2946153. [10] GUAN Yue, ZOU Sai, PENG Haixia, et al. Cooperative UAV trajectory design for disaster area emergency communications: A multiagent PPO method[J]. IEEE Internet of Things Journal, 2024, 11(5): 8848–8859. doi: 10.1109/JIOT.2023.3320796. [11] SILVIRIANTI, NAROTTAMA B, and SHIN S Y. Layerwise quantum deep reinforcement learning for joint optimization of UAV trajectory and resource allocation[J]. IEEE Internet of Things Journal, 2024, 11(1): 430–443. doi: 10.1109/JIOT.2023.3285968. [12] HU Jingzhi, ZHANG Hongliang, SONG Lingyang, et al. Cooperative internet of UAVs: Distributed trajectory design by multi-agent deep reinforcement learning[J]. IEEE Transactions on Communications, 2020, 68(11): 6807–6821. doi: 10.1109/TCOMM.2020.3013599. [13] WU Fanyi, ZHANG Hongliang, WU Jianjun, et al. Cellular UAV-to-device communications: Trajectory design and mode selection by Multi-Agent deep reinforcement learning[J]. IEEE Transactions on Communications, 2020, 68(7): 4175–4189. doi: 10.1109/TCOMM.2020.2986289. [14] DAI Xunhua, LU Zhiyu, CHEN Xuehan, et al. Multiagent RL-based joint trajectory scheduling and resource allocation in NOMA-assisted UAV swarm network[J]. IEEE Internet of Things Journal, 2024, 11(8): 14153–14167. doi: 10.1109/JIOT.2023.3340669. [15] ZHANG Zhongyu, LIU Yunpeng, LIU Tianci, et al. DAGN: A real-time UAV remote sensing image vehicle detection framework[J]. IEEE Geoscience and Remote Sensing Letters, 2020, 17(11): 1884–1888. doi: 10.1109/LGRS.2019.2956513. [16] YANG Jun, YOU Xinghui, WU Gaoxiang, et al. Application of reinforcement learning in UAV cluster task scheduling[J]. Future Generation Computer Systems, 2019, 95: 140–148. doi: 10.1016/j.future.2018.11.014. [17] NOBAR S K, AHMED M H, MORGAN Y, et al. Resource allocation in cognitive radio-enabled UAV communication[J]. IEEE Transactions on Cognitive Communications and Networking, 2022, 8(1): 296–310. doi: 10.1109/TCCN.2021.3103531. [18] CHEN Jiming, LI Junkun, and LAI T H. Energy-efficient intrusion detection with a barrier of probabilistic sensors: Global and local[J]. IEEE Transactions on Wireless Communications, 2013, 12(9): 4742–4755. doi: 10.1109/TW.2013.072313.122083. [19] SHAKHOV V V and KOO I. Experiment design for parameter estimation in probabilistic sensing models[J]. IEEE Sensors Journal, 2017, 17(24): 8431–8437. doi: 10.1109/JSEN.2017.2766089. [20] YANG Qianqian, HE Shibo, LI Junkun, et al. Energy-efficient probabilistic area coverage in wireless sensor networks[J]. IEEE Transactions on Vehicular Technology, 2015, 64(1): 367–377. doi: 10.1109/TVT.2014.2300181. [21] AL-HOURANI A, KANDEEPAN S, and LARDNER S. Optimal LAP altitude for maximum coverage[J]. IEEE Wireless Communications Letters, 2014, 3(6): 569–572. doi: 10.1109/LWC.2014.2342736. [22] ZHANG Xinyu and SHIN K G. E-MiLi: Energy-minimizing idle listening in wireless networks[J]. IEEE Transactions on Mobile Computing, 2012, 11(9): 1441–1454. doi: 10.1109/TMC.2012.112. [23] ZHU Changxi, DASTANI M, and WANG Shihan. A survey of multi-agent deep reinforcement learning with communication[J]. Autonomous Agents and Multi-Agent Systems, 2024, 38(1): 4. doi: 10.1007/s10458-023-09633-6. [24] 喻莞芯. 基于多智能体强化学习的无人机集群网络优化设计[D]. [硕士论文], 重庆大学, 2022. doi: 10.27670/d.cnki.gcqdu.2022.001082.YU Wanxin. Optimization design of UAV cluster network based on multi-agent reinforcement learning[D]. [Master dissertation], Chongqing University, 2022. doi: 10.27670/d.cnki.gcqdu.2022.001082. [25] SUTTON R S and BARTO A G. Reinforcement Learning: An Introduction[M]. Cambridge, USA: MIT Press, 1998. [26] WOOD L F. Training neural networks[P]. US, 4914603A, 1990. [27] SIPPER M. A serial complexity measure of neural networks[C]. IEEE International Conference on Neural Networks, San Francisco, USA, 1993: 962–966. doi: 10.1109/ICNN.1993.298687. [28] GUO Shaoai and ZHAO Xiaohui. Multi-agent deep reinforcement learning based transmission latency minimization for delay-sensitive cognitive satellite-UAV networks[J]. IEEE Transactions on Communications, 2023, 71(1): 131–144. doi: 10.1109/TCOMM.2022.3222460. 期刊类型引用(5)
1. 刘言,刘宁波,黄勇,王中训. 利用相位特征筛选参考单元的改进CFAR方法. 烟台大学学报(自然科学与工程版). 2023(03): 371-378 . 百度学术
2. 杜改丽,封治华. 一种改进现有机载雷达探测性能的多普勒处理新技术. 现代雷达. 2022(02): 75-83 . 百度学术
3. 邹俊杰,程丰,万显荣. 外源雷达空时联合恒虚警检测分析与实验. 雷达科学与技术. 2022(04): 415-420+428 . 百度学术
4. 周子铂,王彬彬,张朝伟,刘建卫,徐颖鑫,王志会. 基于迭代对消的外辐射源雷达目标检测方法. 雷达科学与技术. 2022(05): 555-564 . 百度学术
5. 杨威,崔恒荣. 77GHz近程雷达CFAR算法研究. 通信技术. 2021(09): 2125-2131 . 百度学术
其他类型引用(9)
-