高级搜索

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

合双分支优化SAM与全局-局部协同匹配的单样本目标检测方法

樊盛华 尹航 刘俭 瞿涛

樊盛华, 尹航, 刘俭, 瞿涛. 合双分支优化SAM与全局-局部协同匹配的单样本目标检测方法[J]. 电子与信息学报. doi: 10.11999/JEIT250982
引用本文: 樊盛华, 尹航, 刘俭, 瞿涛. 合双分支优化SAM与全局-局部协同匹配的单样本目标检测方法[J]. 电子与信息学报. doi: 10.11999/JEIT250982
FAN Shenghua, YIN Hang, LIU Jian, QU Tao. A One-Shot Object Detection Method Fusing Dual-Branch Optimized SAM and Global-Local Collaborative Matching[J]. Journal of Electronics & Information Technology. doi: 10.11999/JEIT250982
Citation: FAN Shenghua, YIN Hang, LIU Jian, QU Tao. A One-Shot Object Detection Method Fusing Dual-Branch Optimized SAM and Global-Local Collaborative Matching[J]. Journal of Electronics & Information Technology. doi: 10.11999/JEIT250982

合双分支优化SAM与全局-局部协同匹配的单样本目标检测方法

doi: 10.11999/JEIT250982 cstr: 32379.14.JEIT250982
基金项目: 国家自然科学基金(82571371)
详细信息
    作者简介:

    樊盛华:男,博士后,研究方向为图像目标检测、目标跟踪

    尹航:男,助理研究员,研究方向为星载SAR遥感处理

    刘俭:男,助理研究员,研究方向为天线波束控制技术与应用、一体化数字综合控制技术与应用、图像及数据处理

    瞿涛:男,副教授,研究方向为图像目标检测、目标跟踪

    通讯作者:

    瞿涛 qutaowhu@whu.edu.cn

  • 中图分类号: TP391.41

A One-Shot Object Detection Method Fusing Dual-Branch Optimized SAM and Global-Local Collaborative Matching

Funds: The National Natural Science Foundation of China (82571371)
  • 摘要: 单样本目标检测(OSOD)可有效缓解传统目标检测对大规模标注数据的依赖,但现有方法存在新类别特征表达不足、类间相似性高导致判别边界模糊等问题。为此,该文提出基于双分支优化SAM与全局-局部协同匹配的单样本目标检测算法DOS-GLNet。首先,设计基于SAM的双分支微调特征提取网络:全局微调分支通过在SAM的编码器中插入轻量级适配模块,在保留通用视觉表征能力的同时适配检测任务;局部感知分支通过卷积网络提取局部空间节并与全局特征跨层融合,弥补SAM局部信息缺失;结合多尺度特征生成模块,将SAM单尺度特征扩展为多尺度特征金字塔,适配不同尺寸目标检测需求。其次,构建全局-局部协同的两阶段匹配机制:全局匹配模块(GMM)通过余弦相似度度量查询与目标特征的全局相关性,引导区域提议网络生成高质量候选区域;双向局部匹配模块(BLMM)通过稠密4D相关性张量捕捉查询与目标的双向语义关系,实现细粒度特征对齐。在Pascal VOC与MS COCO数据集上的实验表明,DOS-GLNet相较于主流OSOD算法在检测精度上具有显著优势,消融实验验证了各创新模块对检测性能的有效增益,可降低深度学习目标检测网络对海量标注数据的依赖。
  • 图  1  DOS-GLNet算法框架图

    图  2  基于SAM的全局-局部双分支微调模块

    图  3  局部感知分支

    图  4  传统的特征金字塔网络与基于SAM的多尺度生成模块

    图  5  全局匹配模块结构

    图  6  双向局部匹配模块结构

    图  7  算法在MS COCO和PascalVOC数据集上的检测结果与真实框可视化

    图  8  不同查询样本图像引导下算法的检测结果

    图  9  BLMM模块的中间输出特征可视化结果

    表  1  Pascal VOC和MS COCO数据集统计

    数据集图片数量样本数量类别数平均目标数/图
    VOC 2007[33]996324640202.5
    VOC 2012[33]1154027450201.6
    MS COCO[34]123287941079807.3
    下载: 导出CSV

    表  2  不同算法在 Pascal VOC 数据集上的检测结果

    算法 基类
    plant sofa tv car bottle boat chair person bus train horse bike
    SiamRPN[35] 1.90 15.70 4.50 12.80 1.00 1.10 6.10 8.70 7.90 6.90 17.40 17.80
    OSCD[26] 28.40 41.50 65.00 66.40 37.10 49.80 16.20 31.70 69.70 73.10 75.60 71.60
    CoAE[30] 30.00 54.90 64.10 66.70 40.10 54.10 14.70 60.900 77.50 78.30 77.90 73.20
    AIT[36] 46.40 60.50 68.00 73.60 49.00 65.10 26.60 68.20 82.60 85.40 82.90 77.10
    BHRL[37] 57.50 49.40 76.80 80.40 61.20 58.40 48.10 83.30 74.30 87.30 80.10 81.00
    BSPG[15] 55.00 55.60 78.30 81.40 62.50 59.50 50.90 81.70 74.70 87.50 82.10 81.00
    DOS-GLNet(本方法) 58.20 48.60 73.10 84.80 62.90 62.80 46.30 86.60 74.30 87.10 84.60 79.70
    下载: 导出CSV
    续表2 不同算法在 Pascal VOC 数据集上的检测结果
    算法 基类 新类 平均AP50
    bird motobike table 平均AP50 cow sheep cat Aeroplane 平均AP50
    SiamRPN[35] 7.20 18.50 5.10 9.60 15.90 15.70 21.70 3.50 14.20 11.90
    OSCD[26] 52.30 63.40 39.80 52.70 75.30 60.00 47.90 25.30 52.10 52.40
    CoAE[30] 70.80 70.80 46.20 60.10 83.90 67.10 75.60 46.20 68.20 64.20
    AIT[36] 71.80 75.10 60.00 67.20 85.50 72.80 80.40 50.20 72.20 69.70
    BHRL[37] 730 78.80 38.80 69.70 81.00 67.90 85.40 59.30 72.60 71.20
    BSPG[15] 73.90 79.10 39.50 70.50 80.60 67.40 84.60 61.40 73.50 72.00
    DOS-GLNet(本方法) 68.30 78.90 32.90 69.60 84.20 72.00 85.50 70.10 77.90 73.80
    下载: 导出CSV

    表  3  Pascal VOC 数据集对比结果

    方法发表年份基类 AP50新类 AP50平均 AP50
    FSOD[8]202368.5073.2070.85
    OSOD-ViT[38]202469.0075.5072.25
    DOS-GLNet(本方法)-69.6077.9073.75
    下载: 导出CSV

    表  4  不同算法在MS COCO数据集上的检测结果

    算法名称 基类 新类 平均AP50
    Split1 Split2 Split3 Split4 平均AP50 Split1 Split2 Split3 Split4 平均AP50
    SiamMask[39] 38.90 37.10 37.80 36.60 37.60 15.30 17.60 17.40 17.00 16.80 27.20
    CoAE[30] 42.20 40.20 39.90 41.30 40.90 23.40 23.60 20.50 20.40 22.00 31.20
    AIT[36] 50.10 47.20 45.80 46.90 47.50 26.00 26.40 22.30 22.60 24.30 35.90
    BHRL[37] 56.00 52.10 52.60 53.40 53.50 26.10 29.00 22.70 24.50 25.60 39.60
    BSPG[15] 57.10 54.10 54.00 54.60 55.0 27.70 30.70 24.60 26.30 27.30 41.20
    DOS-GLNet (本方法) 57.00 54.60 55.10 54.90 55.4 30.30 32.90 26.10 27.80 29.30 42.40
    下载: 导出CSV

    表  5  MS COCO 数据集对比结果

    方法发表年份基类 AP50新类 AP50平均 AP50
    FSOD[8]202353.2026.5039.85
    OSR-ViT[36]202454.0027.8040.90
    DOS-GLNet (本方法)-55.4029.3042.35
    下载: 导出CSV

    表  6  不同主干网络和多尺度网络的实验结果

    实验组主干网络多尺度网络基类AP50新类AP50
    -ResNetFPN68.8075.30
    本文框架SAM-ViT-BMSFG69.6077.90
    本文框架SAM-ViT-LMSFG70.2078.50
    本文框架SAM-ViT-HMSFG70.8079.10
    下载: 导出CSV

    表  7  全局-局部匹配模块消融实验结果

    RPN网络检测头基类AP50新类AP50
    SiamMask[39]BLMM69.1074.30
    GMM (本方法)Reweighting67.4073.20
    GMM (本方法)Depthwise Cross Correlation68.7076.80
    GMM (本方法)Non-local Attention68.1075.50
    GMM (本方法)BLMM69.6077.90
    下载: 导出CSV

    表  8  不同编码器在 Pascal VOC 和 MS COCO 数据集上的检测精度

    编码器Pascal VOCMS COCOFPS耗时(ms)
    ViT-B69.6055.402835.70
    ViT-L70.2056.101855.60
    ViT-H70.8056.8010100.00
    下载: 导出CSV
  • [1] LIU Quanyong, PENG Jiangtao, ZHANG Genwei, et al. Deep contrastive learning network for small-sample hyperspectral image classification[J]. Journal of Remote Sensing, 2023, 3: 0025. doi: 10.34133/remotesensing.0025.
    [2] 邵延华, 张铎, 楚红雨, 等. 基于深度学习的YOLO目标检测综述[J]. 电子与信息学报, 2022, 44(10): 3697–3708. doi: 10.11999/JEIT210790.

    SHAO Yanhua, ZHANG Duo, CHU Hongyu, et al. A review of YOLO object detection based on deep learning[J]. Journal of Electronics & Information Technology, 2022, 44(10): 3697–3708. doi: 10.11999/JEIT210790.
    [3] MEI Shaohui, LIAN Jiawei, WANG Xiaofei, et al. A comprehensive study on the robustness of deep learning-based image classification and object detection in remote sensing: Surveying and benchmarking[J]. Journal of Remote Sensing, 2024, 4: 0219. doi: 10.34133/remotesensing.0219.
    [4] GIRSHICK R, DONAHUE J, DARRELL T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]. 2014 IEEE Conference on Computer Vision and Pattern Recognition, Columbus, USA, 2014: 580–587. doi: 10.1109/CVPR.2014.81.
    [5] REDMON J and FARHADI A. YOLO9000: Better, faster, stronger[C]. 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, USA, 2017: 7263–7271. doi: 10.1109/CVPR.2017.690.
    [6] LAW H and DENG Jia. CornerNet: Detecting objects as paired keypoints[C]. The 15th European Conference on Computer Vision, Munich, Germany, 2018: 734–750. doi: 10.1007/978-3-030-01264-9_45.
    [7] CARION N, MASSA F, SYNNAEVE G, et al. End-to-end object detection with transformers[C]. The 16th European Conference on Computer Vision, Glasgow, UK, 2020: 213–229. doi: 10.1007/978-3-030-58452-8_13.
    [8] HAN Guangxing and LIM S N. Few-shot object detection with foundation models[C]. 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, USA, 2024: 28608–28618. doi: 10.1109/cvpr52733.2024.02703.
    [9] NASEER A, ALZAHRANI H A, ALMUJALLY N A, et al. Efficient multi-object recognition using GMM segmentation feature fusion approach[J]. IEEE Access, 2024, 12: 37165–37178. doi: 10.1109/ACCESS.2024.3372190.
    [10] DENG Jia, DONG Wei, SOCHER R, et al. ImageNet: A large-scale hierarchical image database[C]. 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, USA, 2009: 248–255. doi: 10.1109/CVPR.2009.5206848.
    [11] HE Kaiming, ZHANG Xiangyu, REN Shaoqing, et al. Deep residual learning for image recognition[C]. 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, USA, 2016: 770–778. doi: 10.1109/CVPR.2016.90.
    [12] KIRILLOV A, MINTUN E, RAVI N, et al. Segment anything[C]. 2023 IEEE/CVF International Conference on Computer Vision, Paris, France, 2023: 4015–4026. doi: 10.1109/ICCV51070.2023.00371.
    [13] LIU Shu, QI Lu, QIN Haifang, et al. Path aggregation network for instance segmentation[C]. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, USA, 2018: 8759–8768. doi: 10.1109/CVPR.2018.00913.
    [14] MILIOTO A, MANDTLER L, and STACHNISS C. Fast instance and semantic segmentation exploiting local connectivity, metric learning, and one-shot detection for robotics[C]. 2019 International Conference on Robotics and Automation, Montreal, Canada, 2019: 20–24. doi: 10.1109/ICRA.2019.8793593.
    [15] ZHANG Wenwen, HU Yun, SHAN Hangguan, et al. Exploring base-class suppression with prior guidance for bias-free one-shot object detection[C]. The 38th AAAI Conference on Artificial Intelligence, Vancouver, Canada, 2024: 7314–7322. doi: 10.1609/aaai.v38i7.28561.
    [16] DALAL N and TRIGGS B. Histograms of oriented gradients for human detection[C]. 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Diego, USA, 2005: 886–893. doi: 10.1109/CVPR.2005.177.
    [17] HE Kaiming, GKIOXARI G, DOLLÁR P, et al. Mask R-CNN[C]. 2017 IEEE International Conference on Computer Vision, Venice, Italy, 2017: 2961–2969. doi: 10.1109/ICCV.2017.322.
    [18] 刘佳琳. 基于改进YOLOv8的行人和车辆检测算法的研究[D]. [硕士论文], 大连交通大学, 2025. doi: 10.26990/d.cnki.gsltc.2025.000813.

    LIU Jialin. Research on improved YOLOv8-based pedestrian and vehicle detection algorithm[D]. [Master dissertation], Dalian Jiaotong University, 2025. doi: 10.26990/d.cnki.gsltc.2025.000813.
    [19] LIU Shilong, LI Feng, ZHANG Hao, et al. DAB-DETR: Dynamic anchor boxes are better queries for DETR[C]. The 10th International Conference on Learning Representations, 2022.
    [20] ZHAO Yian, LÜ Wenyu, XU Shangliang, et al. DETRs beat YOLOs on real-time object detection[C]. 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, USA, 2024: 16965–16974. doi: 10.1109/CVPR52733.2024.01605.
    [21] WANG Yaqing, YAO Quanming, KWOK J T, et al. Generalizing from a few examples: A survey on few-shot learning[J]. ACM Computing Surveys (CSUR), 2021, 53(3): 63. doi: 10.1145/3386252.
    [22] SUN Bo, LI Banghuai, CAI Shengcai, et al. FSCE: Few-shot object detection via contrastive proposal encoding[C]. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, USA, 2021: 7352–7362. doi: 10.1109/CVPR46437.2021.00727.
    [23] HU Hanzhe, BAI Shuai, LI Aoxue, et al. Dense relation distillation with context-aware aggregation for few-shot object detection[C]. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, USA, 2021: 10185–10194. doi: 10.1109/CVPR46437.2021.01005.
    [24] KANG Bingyi, LIU Zhuang, WANG Xin, et al. Few-shot object detection via feature reweighting[C]. 2019 IEEE/CVF International Conference on Computer Vision, Seoul, Korea (South), 2019: 8420–8429. doi: 10.1109/ICCV.2019.00851.
    [25] HAN Guangxing, MA Jiawei, HUANG Shiyuan, et al. Few-shot object detection with fully cross-transformer[C]. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, USA, 2022: 5321–5330. doi: 10.1109/CVPR52688.2022.00525.
    [26] FU Kun, ZHANG Tengfei, ZHANG Yue, et al. OSCD: A one-shot conditional object detection framework[J]. Neurocomputing, 2021, 425: 243–255. doi: 10.1016/j.neucom.2020.04.092.
    [27] DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16x16 words: Transformers for image recognition at scale[C]. The 9th International Conference on Learning Representations, 2021.
    [28] LIN T Y, DOLLÁR P, GIRSHICK R, et al. Feature pyramid networks for object detection[C]. 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, USA, 2017: 2117–2125. doi: 10.1109/CVPR.2017.106.
    [29] LI Bo, WU Wei, WANG Qiang, et al. SiamRPN++: Evolution of siamese visual tracking with very deep networks[C]. Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, USA, 2019: 4282–4291. doi: 10.1109/CVPR.2019.00441.
    [30] HSIEH T I, LO Y C, CHEN H T, et al. One-shot object detection with co-attention and co-excitation[C]. The 33rd International Conference on Neural Information Processing Systems, Vancouver, Canada, 2019: 245.
    [31] CHEN T I, LIU Y C, SU H T, et al. Dual-awareness attention for few-shot object detection[J]. IEEE Transactions on Multimedia, 2023, 25: 291–301. doi: 10.1109/TMM.2021.3125195.
    [32] MELEKHOV I, TIULPIN A, SATTLER T, et al. DGC-Net: Dense geometric correspondence network[C]. 2019 IEEE Winter Conference on Applications of Computer Vision, Waikoloa, USA, 2019: 1034–1042. doi: 10.1109/WACV.2019.00115.
    [33] EVERINGHAM M, VAN GOOL L, WILLIAMS C K I, et al. The PASCAL visual object classes (VOC) challenge[J]. International Journal of Computer Vision, 2010, 88(2): 303–338. doi: 10.1007/s11263-009-0275-4.
    [34] LIN T Y, MAIRE M, BELONGIE S, et al. Microsoft COCO: Common objects in context[C]. The 13th European Conference on Computer Vision, Zurich, Switzerland, 2014: 740–755. doi: 10.1007/978-3-319-10602-1_48.
    [35] LI Bo, YAN Junjie, WU Wei, et al. High performance visual tracking with Siamese region proposal network[C]. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, USA, 2018: 8971–8980. doi: 10.1109/CVPR.2018.00935.
    [36] CHEN Dingjie, HSIEH H Y, and LIU T L. Adaptive image transformer for one-shot object detection[C]. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville: IEEE, 2021: 12247–12256. doi: 10.1109/CVPR46437.2021.01207.
    [37] YANG Hanqing, CAI Sijia, SHENG Hualian, et al. Balanced and hierarchical relation learning for one-shot object detection[C]. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, USA, 2022: 7591–7600. doi: 10.1109/CVPR52688.2022.00744.
    [38] INKAWHICH M, INKAWHICH N, YANG Hao, et al. OSR-ViT: A simple and modular framework for open-set object detection and discovery[C]. 2024 IEEE International Conference on Big Data, Washington, USA, 2024: 928–937. doi: 10.1109/BigData62323.2024.10826036.
    [39] HU Weiming, WANG Qiang, ZHANG Li, et al. SiamMask: A framework for fast online object tracking and segmentation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45(3): 3072–3089. doi: 10.1109/TPAMI.2022.3172932.
  • 加载中
图(9) / 表(9)
计量
  • 文章访问数:  86
  • HTML全文浏览量:  31
  • PDF下载量:  8
  • 被引次数: 0
出版历程
  • 收稿日期:  2025-09-25
  • 修回日期:  2025-11-23
  • 录用日期:  2026-01-05
  • 网络出版日期:  2026-01-09

目录

    /

    返回文章
    返回