高级搜索

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

基于线性滤波器的四旋翼无人机强化学习控制策略

华和安 方勇纯 钱辰 张雪涛

华和安, 方勇纯, 钱辰, 张雪涛. 基于线性滤波器的四旋翼无人机强化学习控制策略[J]. 电子与信息学报, 2021, 43(12): 3407-3417. doi: 10.11999/JEIT210251
引用本文: 华和安, 方勇纯, 钱辰, 张雪涛. 基于线性滤波器的四旋翼无人机强化学习控制策略[J]. 电子与信息学报, 2021, 43(12): 3407-3417. doi: 10.11999/JEIT210251
He’an HUA, Yongchun FANG, Chen QIAN, Xuetao ZHANG. Reinforcement Learning Control Strategy of Quadrotor Unmanned Aerial Vehicles Based on Linear Filter[J]. Journal of Electronics & Information Technology, 2021, 43(12): 3407-3417. doi: 10.11999/JEIT210251
Citation: He’an HUA, Yongchun FANG, Chen QIAN, Xuetao ZHANG. Reinforcement Learning Control Strategy of Quadrotor Unmanned Aerial Vehicles Based on Linear Filter[J]. Journal of Electronics & Information Technology, 2021, 43(12): 3407-3417. doi: 10.11999/JEIT210251

基于线性滤波器的四旋翼无人机强化学习控制策略

doi: 10.11999/JEIT210251
基金项目: 国家自然科学基金(61873132, 61633012)
详细信息
    作者简介:

    华和安:男,1995年生,博士生,研究方向为旋翼无人机的智能控制与规划

    方勇纯:男,1973年生,教授,研究方向为非线性控制、机器人视觉伺服、无人机和桥式吊车等欠驱动系统控制

    钱辰:男,1993年生,博士生,研究方向为扑翼飞行器和其他仿生机器人的设计和控制

    张雪涛:男,1992年生,副教授,研究方向为自主旋翼无人机的运动计划,视觉伺服,状态和干扰估计

    通讯作者:

    方勇纯 fangyc@nankai.edu.cn

  • 中图分类号: V279; TP273

Reinforcement Learning Control Strategy of Quadrotor Unmanned Aerial Vehicles Based on Linear Filter

Funds: The National Natural Science Foundation of China (61873132, 61633012)
  • 摘要: 针对四旋翼无人机(UAVs)系统,该文提出一种基于线性降阶滤波器的深度强化学习(RL)策略,进而设计了一种新型的智能控制方法,有效地提高了旋翼无人机对外界干扰和未建模动态的鲁棒性。首先,基于线性降阶滤波技术,设计了维数更少的滤波器变量作为深度网络的输入,减小了策略的探索空间,提高了策略的探索效率。在此基础上,为了增强策略对稳态误差的感知,该文结合滤波器变量和积分项,设计集总误差作为策略的新输入,提高了旋翼无人机的定位精度。该文的新颖之处在于,首次提出一种基于线性滤波器的深度强化学习策略,有效地消除了未知干扰和未建模动态对四旋翼无人机控制系统的影响,提高了系统的定位精度。对比实验结果表明,该方法能显著地提升旋翼无人机的定位精度和对干扰的鲁棒性。
  • 图  1  强化学习策略和旋翼无人机的交互

    图  2  四旋翼无人机控制策略学习过程示意图

    图  3  5种学习方法的训练结果及测试2曲线图

    图  4  测试3和测试4结果曲线图

    图  5  实验平台

    表  1  强化学习训练-评价算法

     随机初始化评论家和演员并且以相同的参数初始化对应的目标网络
     初始化回放缓冲区
     for i = 1 to 500 do
       随机初始化无人机位置,观测初始状态
       for j = 1 to 500 do
         根据控制器式(10)生成控制信号作用于无人机
         观测奖励值和下一状态
         将当前的交互数据保存在回放缓冲区中
         随机从缓冲区采样一组数据
         根据式(8)和式(9)更新评论家和演员网络
         根据式(6)更新目标网络
         if j = 500 do
           for k = 1 to 200 do
             测试当前策略
           end for
         end if
       end for
     end for
    下载: 导出CSV

    表  2  系统的参数

    参数参数
    $m$$1.6{\text{ kg}}$${k_i},i = x,y,z$$ 0.1,{\text{ }}0.1,{\text{ }}0.1 $
    ${\boldsymbol{J}}$${\text{diag}}[0.01,0.01,0.02]{\text{ kg}} \cdot {{\text{m}}^{\text{2}}}$${{\boldsymbol{K}}_1}$${\text{diag}}[3.8,3.8,3.5]{\text{ }}$
    $\tau $$ 0.001 $${{\boldsymbol{K}}_2}$${\text{diag}}[5.0,5.0,4.5]{\text{ }}$
    $g$$9.8{\text{ m/}}{{\text{s}}^{\text{2}}}$$\mathcal{B}$$10000$
    $\boldsymbol{\sigma }$$ \left[ {0.1,0.1,1.0} \right] $$\gamma $$0.95$
    $ {\beta _i},i = x,y,z $$ 0.1,{\text{ }}0.1,{\text{ }}0.1 $$N$$64$
    ${\alpha _\omega }$$0.0001$$ {\alpha _\mu } $$0.0001$
    下载: 导出CSV

    表  3  测试2:最大稳态误差和均方根误差

    方法X轴(m)Y轴(m)Z轴(m)
    MSSERMSEMSSERMSEMSSERMSE
    本文算法0.00750.00350.02160.02120.00780.0034
    DDPG1.49041.36610.11400.08211.40021.4001
    DPGIC2.23562.06561.70601.65711.40031.3998
    GC-DDPG0.07160.06980.12970.12270.17400.1689
    GC-DPGIC0.08030.07880.17570.14390.29100.1114
    下载: 导出CSV

    表  4  测试3:最大稳态误差和均方根误差

    方法X轴(m)Y轴(m)Z轴 (m)
    MSSERMSEMSSERMSEMSSERMSE
    本文算法0.47150.08870.32070.06470.03560.0203
    DDPG0.62390.59040.74800.64401.40021.4001
    DPGIC56.4541.6815.2611.803.7921.497
    GC-DDPG0.55430.11700.29180.13200.53820.1896
    GC-DPGIC0.72500.22280.51860.09120.60810.2014
    下载: 导出CSV

    表  5  测试4:最大稳态误差和均方根误差

    方法X轴(m)Y轴(m)Z轴(m)
    MSSERMSEMSSERMSEMSSERMSE
    几何控制器0.07300.04370.12210.05950.06700.0371
    所提控制器0.03400.01170.08020.03480.05350.0203
    下载: 导出CSV
  • [1] 张坤, 高晓光. 未知风场扰动下无人机三维航迹跟踪鲁棒最优控制[J]. 电子与信息学报, 2015, 37(12): 3009–3015.

    ZHANG Kun and GAO Xiaoguang. Robust optimal control for unmanned aerial vehicles’ three-dimensional trajectory tracking in wind disturbance[J]. Journal of Electronics &Information Technology, 2015, 37(12): 3009–3015.
    [2] 宋大雷, 齐俊桐, 韩建达, 等. 旋翼飞行机器人系统建模与主动模型控制理论及实验研究[J]. 自动化学报, 2011, 37(4): 480–495. doi: 10.3724/SP.J.1004.2011.00480

    SONG Dalei, QI Juntong, HAN Jianda, et al. Model identification and active modeling control for rotor fly-robot: Theory and experiment[J]. Acta Automatica Sinica, 2011, 37(4): 480–495. doi: 10.3724/SP.J.1004.2011.00480
    [3] 孟祥冬, 何玉庆, 韩建达. 接触作业型飞行机械臂系统的力/位置混合控制[J]. 机器人, 2020, 42(2): 167–178.

    MENG Xiangdong, HE Yuqing, and HAN Jianda. Hybrid force/position control of aerial manipulators in contact operation[J]. Robot, 2020, 42(2): 167–178.
    [4] 王诗章, 鲜斌, 杨森. 无人机吊挂飞行系统的减摆控制设计[J]. 自动化学报, 2018, 44(10): 1771–1780.

    WANG Shizhang, XIAN Bin, and YANG Sen. Anti-swing controller design for an unmanned aerial vehicle with a slung-load[J]. Acta Automatica Sinica, 2018, 44(10): 1771–1780.
    [5] 甄子洋. 舰载无人机自主着舰回收制导与控制研究进展[J]. 自动化学报, 2019, 45(4): 669–681.

    ZHEN Ziyang. Research development in autonomous carrier-landing/ship-recovery guidance and control of unmanned aerial vehicles[J]. Acta Automatica Sinica, 2019, 45(4): 669–681.
    [6] 赵太飞, 宫春杰, 张港, 等. 一种无人机集群安全高效的分区集结控制策略[J]. 电子与信息学报, 2021, 43(8): 2181–2188. doi: 10.11999/JEIT200601

    ZHAO Taifei, GONG Chunjie, ZHANG Gang, et al. A safe and high efficiency control strategy of unmanned aerial vehicles partition rendezvous[J]. Journal of Electronics and Information Technology, 2021, 43(8): 2181–2188. doi: 10.11999/JEIT200601
    [7] 李瑞涵, 王耀南, 谭建豪. Nesterov加速梯度无人机姿态融合算法[J]. 机器人, 2018, 40(6): 852–859.

    LI Ruihan, WANG Yaonan, and TAN Jianhao. Attitude fusion algorithm of UAV based on Nesterov accelerated gradient[J]. Robot, 2018, 40(6): 852–859.
    [8] 高杨, 李东生, 程泽新. 无人机分布式集群态势感知模型研究[J]. 电子与信息学报, 2018, 40(6): 1271–1278. doi: 10.11999/JEIT170877

    GAO Yang, LI Dongsheng, and CHENG Zexin. UAV distributed swarm situation awareness model[J]. Journal of Electronics &Information Technology, 2018, 40(6): 1271–1278. doi: 10.11999/JEIT170877
    [9] ZHENG Dongliang, WANG Hesheng, WANG Jingchuan, et al. Toward visibility guaranteed visual servoing control of quadrotor UAVs[J]. IEEE/ASME Transactions on Mechatronics, 2019, 24(3): 1087–1095. doi: 10.1109/TMECH.2019.2906430
    [10] ZHANG Xuetao, FANG Yongchun, ZHANG Xuebao, et al. A novel geometric hierarchical approach for dynamic visual servoing of quadrotors[J]. IEEE Transactions on Industrial Electronics, 2020, 67(5): 3840–3849. doi: 10.1109/TIE.2019.2917420
    [11] MAHONY R and HAMEL T. Image-based visual servo control of aerial robotic systems using linear image features[J]. IEEE Transactions on Robotics, 2005, 21(2): 227–239. doi: 10.1109/TRO.2004.835446
    [12] LIU Hao, ZHAO Wanbin, ZUO Zongyu, et al. Robust control for quadrotors with multiple time-varying uncertainties and delays[J]. IEEE Transactions on Industrial Electronics, 2017, 64(2): 1303–1312. doi: 10.1109/TIE.2016.2612618
    [13] HUA He’an, FANG Yongchun, ZHANG Xuetao, et al. Auto-tuning nonlinear PID-type controller for rotorcraft-based aggressive transportation[J]. Mechanical Systems and Signal Processing, 2020, 145: 106858. doi: 10.1016/j.ymssp.2020.106858
    [14] ZUO Zongyu and MALLIKARJUNAN S. L1 adaptive backstepping for robust trajectory tracking of UAVs[J]. IEEE Transactions on Industrial Electronics, 2017, 64(4): 2944–2954. doi: 10.1109/TIE.2016.2632682
    [15] LV Zongyang, LI Shengming, WU Yuhu, et al. Adaptive control for a quadrotor transporting a cable-suspended payload with unknown mass in the presence of rotor downwash[J]. IEEE Transactions on Vehicular Technology, 2021, 70(9): 8505–8518. doi: 10.1109/TVT.2021.3096234
    [16] TIAN Bailing, YIN Liping, and WANG Hong. Finite-time reentry attitude control based on adaptive multivariable disturbance compensation[J]. IEEE Transactions on Industrial Electronics, 2015, 62(9): 5889–5898. doi: 10.1109/TIE.2015.2442224
    [17] XIAN Bin and HAO Wei. Nonlinear robust fault-tolerant control of the tilt trirotor UAV under rear servo's stuck fault: Theory and experiments[J]. IEEE Transactions on Industrial Informatics, 2019, 15(4): 2158–2166. doi: 10.1109/TII.2018.2858143
    [18] SHI Haobin, LI Xuesi, HWANG K S, et al. Decoupled visual servoing with fuzzy Q-learning[J]. IEEE Transactions on Industrial Informatics, 2018, 14(1): 241–252. doi: 10.1109/TII.2016.2617464
    [19] HWANGBO J, SA I, SIEGWART R, et al. Control of a quadrotor with reinforcement learning[J]. IEEE Robotics and Automation Letters, 2017, 2(4): 2096–2103. doi: 10.1109/LRA.2017.2720851
    [20] MNIH V, KAVUKCUOGLU K, SILVER D, et al. Human-level control through deep reinforcement learning[J]. Nature, 2015, 518(7540): 529–533. doi: 10.1038/nature14236
    [21] SILVER D, LEVER G, HEESS N, et al. Deterministic policy gradient algorithms[C]. The 31st International Conference on Machine Learning, Beijing, China, 2014: 387–395.
    [22] LILLICRAP T P, HUNT J J, PRITZEL A, et al. Continuous control with deep reinforcement learning[C]. Proceedings of the 4th International Conference on Learning Representations, San Juan, Puerto Rico, 2016: 1–14.
    [23] RODRIGUEZ-RAMOS A, SAMPEDRO C, BAVLE H, et al. A deep reinforcement learning strategy for UAV autonomous landing on a moving platform[J]. Journal of Intelligent & Robotic Systems, 2019, 93(1/2): 351–366.
    [24] WANG Yuanda, SUN Jia, HE Haibo, et al. Deterministic policy gradient with integral compensator for robust quadrotor control[J]. IEEE Transactions on Systems, Man, and Cybernetics:Systems, 2020, 50(10): 3713–3725. doi: 10.1109/TSMC.2018.2884725
    [25] WEI Qinglai, WANG Lingxiao, LIU Yu, et al. Optimal elevator group control via deep asynchronous actor-critic learning[J]. IEEE Transactions on Neural Networks and Learning Systems, 2020, 31(12): 5245–5256. doi: 10.1109/TNNLS.2020.2965208
    [26] LEE T, LEOK M, and MCCLAMROCH N H. Geometric tracking control of a quadrotor UAV on SE(3)[C]. The 49th IEEE Conference on Decision and Control, Atlanta, USA, 2010: 5420–5425.
    [27] FURRER F, BURRI M, ACHTELIK M, et al. RotorS-a Modular Gazebo MAV Simulator Framework[M]. KOUBAA A. Robot Operating System (ROS): The Complete Reference (Volume 1). Cham: Springer, 2016: 595–625.
  • 加载中
图(5) / 表(5)
计量
  • 文章访问数:  1064
  • HTML全文浏览量:  799
  • PDF下载量:  154
  • 被引次数: 0
出版历程
  • 收稿日期:  2021-03-26
  • 修回日期:  2021-10-20
  • 网络出版日期:  2021-10-27
  • 刊出日期:  2021-12-21

目录

    /

    返回文章
    返回