高级搜索

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

面向动态环境的巡检机器人轻量级语义视觉SLAM框架

余浩扬 李艳生 肖凌励 周继源

余浩扬, 李艳生, 肖凌励, 周继源. 面向动态环境的巡检机器人轻量级语义视觉SLAM框架[J]. 电子与信息学报. doi: 10.11999/JEIT250301
引用本文: 余浩扬, 李艳生, 肖凌励, 周继源. 面向动态环境的巡检机器人轻量级语义视觉SLAM框架[J]. 电子与信息学报. doi: 10.11999/JEIT250301
YU Haoyang, LI Yansheng, XIAO Lingli, ZHOU Jiyuan. A Lightweight Semantic Visual Simultaneous Localization and Mapping Framework for Inspection Robots in Dynamic Environments[J]. Journal of Electronics & Information Technology. doi: 10.11999/JEIT250301
Citation: YU Haoyang, LI Yansheng, XIAO Lingli, ZHOU Jiyuan. A Lightweight Semantic Visual Simultaneous Localization and Mapping Framework for Inspection Robots in Dynamic Environments[J]. Journal of Electronics & Information Technology. doi: 10.11999/JEIT250301

面向动态环境的巡检机器人轻量级语义视觉SLAM框架

doi: 10.11999/JEIT250301 cstr: 32379.14.JEIT250301
基金项目: 国家自然科学基金(52575102),重庆市城市管理局项目(城管科2022-34),重庆市自然科学基金(CSTB2022NSCQ-MSX0340)
详细信息
    作者简介:

    余浩扬:男,硕士,研究方向为基于多传感器的室外巡检机器人视觉同步定位与建国

    李艳生:男,副教授,研究方向为机器人技术与智能制造技术

    肖凌励:女,硕士,研究方向为智能移动机器人

    周继源:男,硕士,研究方向为具身智能机器人

    通讯作者:

    李艳生 liyansheng@cqupt.edu.cn

  • 中图分类号: TN919.85

A Lightweight Semantic Visual Simultaneous Localization and Mapping Framework for Inspection Robots in Dynamic Environments

Funds: The National Natural Science Foundation of China (52575102), Chongqing Urban Administration Bureau Project (Urban Management Science Section 2022-34), Chongqing Natural Science Foundation (CSTB2022NSCQ-MSX0340)
  • 摘要: 为提升巡检机器人在城市动态环境中的定位精度与鲁棒性,该文提出一种基于第3代定向快速与旋转简要同步定位与建图系统 (ORB-SLAM3)的轻量级语义视觉同步定位与建图(SLAM)框架。该框架通过紧耦合所提出的轻量级语义分割模型(DHSR-YOLOSeg)输出的语义信息,实现动态特征点的精准剔除与稳健跟踪,从而有效缓解动态目标干扰带来的特征漂移与建图误差累积问题。DHSR-YOLOSeg基于YOLO第11代轻量级分割模型(YOLOv11n-seg)架构,融合动态卷积模块(C3k2_DynamicConv)、轻量特征融合模块(DyCANet)与复用共享卷积分割(RSCS)头,在分割精度小幅提升的同时,有效降低了计算资源开销,整体展现出良好的轻量化与高效性。在COCO数据集上,相较于基础模型,DHSR-YOLOSeg实现参数量减少13.8%、109次浮点运算(GFLOPs)降低23.1%、平均精度指标(mAP50)提升约2%;在KITTI数据集上,DHSR-YOLOSeg相比其他主流分割模型及YOLO系列不同版本,在保持较高分割精度的同时,进一步压缩了模型参数量与计算开销,系统整体帧率得到有效提升。同时,所提语义SLAM系统通过动态特征点剔除有效提升了定位精度,平均轨迹误差相比ORB-SLAM3降低8.78%;在此基础上,系统平均每帧处理时间较主流方法如DS-SLAM和DynaSLAM分别降低约18.55%与41.83%。研究结果表明,该语义视觉SLAM框架兼具实时性与部署效率,显著提升了动态环境下的定位稳定性与感知能力。
  • 图  1  语义分割与ORB-SLAM3融合的系统框架图

    图  2  YOLOv11n-seg模型结构图

    图  3  MoE机制示意图

    图  4  DySample的整体上采样流程

    图  5  ChannelAttention_HSFPN结构图

    图  6  YOLOv11n分割头结构图

    图  7  RSCD头结构图

    图  8  DHSR-YOLOSeg网络结构图

    图  9  特征点剔除前后对比

    图  10  DHSR-YOLOSeg模型在不同KITTI序列上的动态目标语义分割结果

    表  1  消融实验结果

    实验序号 模块1 模块2 模块3 参数量(M) GFLOPs 模型大小 准确率 mAP50
    基准模型 × × × 2.83 10.40 6.00 0.76 0.51
    1 × × 3.71 10.00 7.80 076 0.52
    2 × × 2.08 9.00 4.50 0.75 0.50
    3 × × 2.58 9.10 7.00 0.76 0.51
    4 × 2.63 8.90 5.60 0.76 0.51
    5 × 3.44 8.90 8.70 0.75 0.52
    6 × 1.89 8.10 4.60 0.73 0.49
    7 2.44 8.00 5.70 0.77 0.51
    下载: 导出CSV

    表  2  对比实验结果

    模型 参数量(M) GFLOPs mAP50 fps(帧率/s)
    Mask R-CNN 35.92 67.16 0.47 16.05
    YOLOv5n-seg 2.05 4.31 0.48 57.89
    YOLOv8n-seg 3.26 6.78 0.50 56.51
    YOLOv11n-seg 2.84 5.18 0.50 58.93
    DHSR-YOLOSeg 1.45 4.49 0.51 60.19
    下载: 导出CSV

    表  3  不同语义SLAM方法在KITTI序列上的轨迹RMSE误差对比(m)

    序列ORB-SLAM3DynaSLAMDS-SLAM本文
    001.4901.0091.2491.296
    0113.31917.47115.39512.873
    024.2414.4924.1663.531
    040.2730.2540.2610.244
    050.9940.8330.9240.947
    061.1210.7161.2191.162
    070.6300.6580.6440.534
    083.5172.4753.9733.466
    091.6971.2611.7791.715
    101.0460.7521.0260.993
    下载: 导出CSV

    表  4  不同语义SLAM方法平均每帧跟踪处理时间对比(ms)

    序列ORB-SLAM3DynaSLAMDS-SLAM本文
    0036.2584.9164.7544.27
    0133.4778.7857.0841.90
    0237.0786.7161.8650.63
    0436.0184.3857.3852.27
    0536.7185.9259.3750.11
    0636.7485.9961.0948.45
    0735.2382.6659.8948.95
    0836.9086.3463.9254.76
    0935.3782.9758.1850.62
    1034.6181.2956.3746.69
    下载: 导出CSV
  • [1] HALDER S and AFSARI K. Robots in inspection and monitoring of buildings and infrastructure: A systematic review[J]. Applied Sciences, 2023, 13(4): 2304. doi: 10.3390/app13042304.
    [2] LI Yuhao, FU Chengguo, YANG Hui, et al. Design of a closed piggery environmental monitoring and control system based on a track inspection robot[J]. Agriculture, 2023, 13(8): 1501. doi: 10.3390/agriculture13081501.
    [3] 罗朝阳, 张荣芬, 刘宇红, 等. 自动驾驶场景下的行人意图语义VSLAM[J]. 计算机工程与应用, 2024, 60(17): 107–116. doi: 10.3778/j.issn.1002-8331.2306-0159.

    LUO Zhaoyang, ZHANG Rongfen, LIU Yuhong, et al. Pedestrian intent semantic VSLAM in automatic driving scenarios[J]. Computer Engineering and Applications, 2024, 60(17): 107–116. doi: 10.3778/j.issn.1002-8331.2306-0159.
    [4] 李国逢, 谈嵘, 曹媛媛. 手持SLAM: 城市测量的新方法与实践[J]. 测绘通报, 2024(S2): 255–259. doi: 10.13474/j.cnki.11-2246.2024.S253.

    LI Guofeng, TAN Rong, and CAO Yuanyuan. Handheld SLAM: Emerging techniques and practical implementations in urban surveying[J]. Bulletin of Surveying and Mapping, 2024(S2): 255–259. doi: 10.13474/j.cnki.11-2246.2024.S253.
    [5] ZHANG Tianzhe and DAI Jun. Electric power intelligent inspection robot: A review[J]. Journal of Physics: Conference Series, 2021, 1750(1): 012023. doi: 10.1088/1742-6596/1750/1/012023.
    [6] MUR-ARTAL R, MONTIEL J M M, and TARDÓS J D. ORB-SLAM: A versatile and accurate monocular SLAM system[J]. IEEE Transactions on Robotics, 2015, 31(5): 1147–1163. doi: 10.1109/TRO.2015.2463671.
    [7] MUR-ARTAL R and TARDÓS J D. ORB-SLAM2: An open-source SLAM system for monocular, stereo, and RGB-D cameras[J]. IEEE Transactions on Robotics, 2017, 33(5): 1255–1262. doi: 10.1109/TRO.2017.2705103.
    [8] CAMPOS C, ELVIRA R, RODRÍGUEZ J J G, et al. ORB-SLAM3: An accurate open-source library for visual, visual–inertial, and multimap SLAM[J]. IEEE Transactions on Robotics, 2021, 37(6): 1874–1890. doi: 10.1109/TRO.2021.3075644.
    [9] QIN Tong, LI Peiliang, and SHEN Shaojie. VINS-Mono: A robust and versatile monocular visual-inertial state estimator[J]. IEEE Transactions on Robotics, 2018, 34(4): 1004–1020. doi: 10.1109/TRO.2018.2853729.
    [10] ZANG Qiuyu, ZHANG Kehua, WANG Ling, et al. An adaptive ORB-SLAM3 system for outdoor dynamic environments[J]. Sensors, 2023, 23(3): 1359. doi: 10.3390/s23031359.
    [11] WU Hangbin, ZHAN Shihao, SHAO Xiaohang, et al. SLG-SLAM: An integrated SLAM framework to improve accuracy using semantic information, laser and GNSS data[J]. International Journal of Applied Earth Observation and Geoinformation, 2024, 133: 104110. doi: 10.1016/j.jag.2024.104110.
    [12] BESCOS B, FÁCIL J M, CIVERA J, et al. DynaSLAM: Tracking, mapping, and inpainting in dynamic scenes[J]. IEEE Robotics and Automation Letters, 2018, 3(4): 4076–4083. doi: 10.1109/LRA.2018.2860039.
    [13] VINCENT J, LABBÉ M, LAUZON J S, et al. Dynamic object tracking and masking for visual SLAM[C]. 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Las Vegas, USA, 2020: 4974–4979. doi: 10.1109/IROS45743.2020.9340958.
    [14] KHANAM R and HUSSAIN M. YOLOv11: An overview of the key architectural enhancements[EB/OL]. https://arxiv.org/abs/2410.17725, 2024.
    [15] XU Ziheng, NIU Jianwei, LI Qingfeng, et al. NID-SLAM: Neural implicit representation-based RGB-D SLAM in dynamic environments[C]. 2024 IEEE International Conference on Multimedia and Expo (ICME), Niagara Falls, Canada, 2024: 1–6. doi: 10.1109/ICME57554.2024.10687512.
    [16] GONG Can, SUN Ying, ZOU Chunlong, et al. Real-time visual SLAM based YOLO-fastest for dynamic scenes[J]. Measurement Science and Technology, 2024, 35(5): 056305. doi: 10.1088/1361-6501/ad2669.
    [17] WU Peiyi, TONG Pengfei, ZHOU Xin, et al. Dyn-DarkSLAM: YOLO-based visual SLAM in low-light conditions[C]. 2024 IEEE 25th China Conference on System Simulation Technology and its Application (CCSSTA), Tianjin, China, 2024: 346–351. doi: 10.1109/CCSSTA62096.2024.10691775.
    [18] ZHANG Ruidong and ZHANG Xinguang. Geometric constraint-based and improved YOLOv5 semantic SLAM for dynamic scenes[J]. ISPRS International Journal of Geo-Information, 2023, 12(6): 211. doi: 10.3390/ijgi12060211.
    [19] YANG Tingting, JIA Shuwen, YU Ying, et al. Enhancing visual SLAM in dynamic environments with improved YOLOv8[C]. The Sixteenth International Conference on Digital Image Processing (ICDIP), Haikou, China, 2024: 132741Y. doi: 10.1117/12.3037734.
    [20] HAN Kai, WANG Yunhe, GUO Jianyuan, et al. ParameterNet: Parameters are all you need for large-scale visual pretraining of mobile networks[C]. The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, USA, 2024: 15751–15761. doi: 10.1109/CVPR52733.2024.01491.
    [21] YU Jiazuo, ZHUGE Yunzhi, ZHANG Lu, et al. Boosting continual learning of vision-language models via mixture-of-experts adapters[C]. The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, USA, 2024: 23219–23230. doi: 10.1109/CVPR52733.2024.02191.
    [22] LIU Wenze, LU Hao, FU Hongtao, et al. Learning to upsample by learning to sample[C]. The IEEE/CVF International Conference on Computer Vision (ICCV), Paris, France, 2023: 6004–6014. doi: 10.1109/ICCV51070.2023.00554.
    [23] HUANGFU Yi, HUANG Zhonghao, YANG Xiaogang, et al. HHS-RT-DETR: A method for the detection of citrus greening disease[J]. Agronomy, 2024, 14(12): 2900. doi: 10.3390/agronomy14122900.
    [24] CAO Qi, CHEN Hang, WANG Shang, et al. LH-YOLO: A lightweight and high-precision SAR ship detection model based on the improved YOLOv8n[J]. Remote Sensing, 2024, 16(22): 4340. doi: 10.3390/rs16224340.
    [25] YU Chao, LIU Zuxin, LIU Xinjun, et al. DS-SLAM: A semantic visual SLAM towards dynamic environments[C]. 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Madrid, Spain, 2018: 1168–1174. doi: 10.1109/IROS.2018.8593691.
  • 加载中
图(10) / 表(4)
计量
  • 文章访问数:  151
  • HTML全文浏览量:  115
  • PDF下载量:  13
  • 被引次数: 0
出版历程
  • 收稿日期:  2025-04-25
  • 修回日期:  2025-07-28
  • 网络出版日期:  2025-08-04

目录

    /

    返回文章
    返回