高级搜索

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

基于条件扩散模型样本生成的小样本目标检测

梅天灿 王亚茹 陈元豪

梅天灿, 王亚茹, 陈元豪. 基于条件扩散模型样本生成的小样本目标检测[J]. 电子与信息学报, 2025, 47(4): 1182-1191. doi: 10.11999/JEIT240841
引用本文: 梅天灿, 王亚茹, 陈元豪. 基于条件扩散模型样本生成的小样本目标检测[J]. 电子与信息学报, 2025, 47(4): 1182-1191. doi: 10.11999/JEIT240841
MEI Tiancan, WANG Yaru, CHEN Yuanhao. Sample Generation Based on Conditional Diffusion Model for Few-Shot Object Detection[J]. Journal of Electronics & Information Technology, 2025, 47(4): 1182-1191. doi: 10.11999/JEIT240841
Citation: MEI Tiancan, WANG Yaru, CHEN Yuanhao. Sample Generation Based on Conditional Diffusion Model for Few-Shot Object Detection[J]. Journal of Electronics & Information Technology, 2025, 47(4): 1182-1191. doi: 10.11999/JEIT240841

基于条件扩散模型样本生成的小样本目标检测

doi: 10.11999/JEIT240841
详细信息
    作者简介:

    梅天灿:男,副教授,研究方向为计算机视觉、模式识别、机器学习

    王亚茹:女,硕士生,研究方向为计算机视觉、目标检测

    陈元豪:男,硕士生,研究方向为深度学习、目标检测

    通讯作者:

    梅天灿 mtc@whu.edu.cn

  • 中图分类号: TN911.73; TP391.41

Sample Generation Based on Conditional Diffusion Model for Few-Shot Object Detection

  • 摘要: 利用生成模型为小样本目标检测提供额外样本是解决样本稀缺问题的方法之一。现有生成额外样本的方法,多关注于生成样本的多样性,而忽略了生成样本的质量和代表性。为解决这一问题,该文提出了一个新的基于数据生成的小样本目标检测框架FQRS。首先,构造类间条件控制模块使得数据生成器能够学习不同类别间的关系,利用基类和新类的类间关系信息辅助模型估计新类的分布,从而提高生成样本的质量。其次,设计类内条件控制模块,利用交并比(IOU)信息限制生成样本在特征空间的位置,通过控制生成的样本更聚集于类别的中心,确保它们能够捕捉对应类别的关键特征,从而提高生成样本的代表性。在PASCAL VOC和MS COCO数据集上进行测试,在不同小样本条件下,该文提出的模型均超过当前最好的两阶段微调目标检测模型—解耦的更快区域卷积神经网络(DeFRCN)。实验验证了该文方法在小样本目标检测上具有出色的检测效果。
  • 图  1  类间关系示意图

    图  2  本文所提FQRS架构

    图  3  类间-类内控制模块结构

    图  4  IOU大小不同的RoI区域及特征对比

    图  5  不同方法检测结果对比图

    图  6  不同控制条件下生成的样本对比

    表  1  在PASCAL VOC数据集上本文方法与其他方法结果对比

    方法/shot Novel Set 1 Novel Set 2 Novel Set 3
    1 2 3 5 10 1 2 3 5 10 1 2 3 5 10
    TFA[6] 39.8 36.1 44.7 55.7 56.0 23.5 26.9 34.1 35.1 39.1 30.8 34.8 42.8 49.5 49.8
    MPSR[23] 41.7 51.4 55.2 61.8 24.4 39.2 39.9 47.8 35.6 42.3 48 49.7
    TIP[24] 27.7 36.5 43.3 50.2 59.6 22.7 30.1 33.8 40.9 46.9 21.7 30.6 38.1 44.5 50.9
    DCNet[25] 33.9 37.4 43.7 51.1 59.6 23.2 24.8 30.6 36.7 46.6 32.3 34.9 39.7 42.6 50.7
    CME[26] 41.5 47.5 50.4 58.2 60.9 27.2 30.2 41.4 42.5 46.8 34.3 39.6 45.1 48.3 51.5
    SRR-FSD[10] 47.8 50.5 51.3 55.2 56.8 32.5 35.3 39.1 40.8 43.8 40.1 41.5 44.3 46.9 46.4
    FADI[27] 50.3 54.8 54.2 59.3 63.2 30.6 35.0 40.3 42.8 48.0 45.7 49.7 49.1 55.0 59.6
    DeFRCN[9] 45.7 56.4 59.3 62.6 64.6 35.7 40.5 45.3 50.4 54.1 39.8 50.6 52.8 56.1 60.8
    FCT[28] 38.5 49.6 53.5 59.8 64.3 25.9 34.2 40.1 44.9 47.4 34.7 43.9 49.3 53.1 56.3
    Meta-DETR[29] 35.1 49.0 53.2 57.4 62.0 27.9 32.3 38.4 43.2 51.8 34.9 41.8 47.1 54.1 58.2
    本文方法 47.8 56.6 59.3 63.2 65.6 37.5 43.1 47.5 52.0 56.3 40.2 50.8 53.8 56.6 62.4
    下载: 导出CSV

    表  2  在MS COCO数据集上本文方法与其他方法结果对比

    方法 1-shot 2-shot 3-shot 5-shot 10-shot 30-shot
    TFA[6] 4.4 5.4 6 7.7 10.0 13.7
    MPSR[23] 5.1 6.7 7.4 8.7 9.8 14.1
    FSDetView[30] 4.5 6.6 7.2 10.7 12.5 14.7
    TIP[24] 16.3 18.3
    DCNet[25] 12.8 18.6
    CME[26] 15.1 16.9
    SRR-FSD[10] 11.3 14.7
    FADI[27] 5.7 7 8.6 10.1 12.2 16.1
    DeFRCN[9] 7.7 11.4 13.2 15.5 18.5 22.4
    FCT[28] 5.1 7.2 9.8 12.0 15.3 20.2
    Meta-DETR[29] 7.5 13.5 15.4 19.0 22.2
    本文方法 9.0 12.2 13.6 15.7 18.6 22.6
    下载: 导出CSV

    表  3  本文方法与其他方法跨域实验结果对比

    方法 mAP (%)
    MetaRCNN[8] 37.4
    MPSR[23] 42.3
    DeFRCN[9] 55.9
    本文方法 57.4
    下载: 导出CSV

    表  4  类间-类内控制模块有效性验证

    类间条件控制 类内条件控制 mAP (%)
    语义嵌入 语义关系嵌入
    8.3
    8.8
    8.6
    9.0
    下载: 导出CSV

    表  5  不同参数取值结果对比

    I的取值范围 $ \gamma $ mAP (%)
    [0.5,1] 0.4 8.7
    [0.5,0.75] 0.4 8.2
    [0.75,1] 0.3 8.8
    [0.75,1] 0.5 8.7
    [0.75,1] 0.4 9.0
    下载: 导出CSV

    表  6  参数T不同取值结果对比

    T mAP (%)
    900 8.9
    1 000 9.0
    1 100 9.0
    下载: 导出CSV

    表  7  模型参数量与浮点运算次数对比

    方法 #Param.(M) FLOPs(G)
    TFA[6] 60.5 179.0
    DeFRCN[9]
    (本文目标检测器)
    100.9 318.2
    本文数据生成器 34.6 32.5
    下载: 导出CSV
  • [1] LIU Qiankun, LIU Rui, ZHENG Bolun, et al. Infrared small target detection with scale and location sensitivity[C]. 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, USA, 2024: 17490–17499. doi: 10.1109/CVPR52733.2024.01656.
    [2] ZHANG Gang, CHEN Junnan, GAO Guohuan, et al. SAFDNet: A simple and effective network for fully sparse 3D object detection[C]. 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, USA, 2024: 14477–14486. doi: 10.1109/CVPR52733.2024.01372.
    [3] YE Mingqiao, KE Lei, LI Siyuan, et al. Cascade-DETR: Delving into high-quality universal object detection[C]. 2023 IEEE/CVF International Conference on Computer Vision, Paris, France, 2023: 6704–6714. doi: 10.1109/ICCV51070.2023.00617.
    [4] WANG C Y, BOCHKOVSKIY A, and LIAO H Y M. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors[C]. 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, Canada, 2023: 7464–7475. doi: 10.1109/CVPR52729.2023.00721.
    [5] WANG Yuxiong, RAMANAN D, and HEBERT M. Meta-learning to detect rare objects[C]. 2019 IEEE/CVF International Conference on Computer Vision, Seoul, Korea (South), 2019: 9925–9934. doi: 10.1109/ICCV.2019.01002.
    [6] WANG Xin, HUANG T E, DARRELL T, et al. Frustratingly simple few-shot object detection[C]. The 37th International Conference on Machine Learning, 2020: 920.
    [7] SUN Bo, LI Banghuai, CAI Shengcai, et al. FSCE: Few-shot object detection via contrastive proposal encoding[C]. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, USA, 2021: 7352–7362. doi: 10.1109/CVPR46437.2021.00727.
    [8] YAN Xiaopeng, CHEN Ziliang, XU Anni, et al. Meta R-CNN: Towards general solver for instance-level low-shot learning[C]. 2019 IEEE/CVF International Conference on Computer Vision, Seoul, Korea (South), 2019: 9577–9586. doi: 10.1109/ICCV.2019.00967.
    [9] QIAO Limeng, ZHAO Yuxuan, LI Zhiyuan, et al. DeFRCN: Decoupled faster R-CNN for few-shot object detection[C]. 2021 IEEE/CVF International Conference on Computer Vision, Montreal, Canada, 2021: 8681–8690. doi: 10.1109/ICCV48922.2021.00856.
    [10] ZHU Chenchen, CHEN Fangyi, AHMED U, et al. Semantic relation reasoning for shot-stable few-shot object detection[C]. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, USA, 2021: 8782–8791. doi: 10.1109/CVPR46437.2021.00867.
    [11] REN Shaoqing, HE Kaiming, GIRSHICK R, et al. Faster R-CNN: Towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6): 1137–1149. doi: 10.1109/TPAMI.2016.2577031.
    [12] ZHANG Weilin and WANG Yuxiong. Hallucination improves few-shot object detection[C]. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, USA, 2021: 13008–13017. doi: 10.1109/CVPR46437.2021.01281.
    [13] ZHU Pengkai, WANG Hanxiao, and SALIGRAMA V. Don’t even look once: Synthesizing features for zero-shot detection[C]. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, USA, 2020: 11693–11702. doi: 10.1109/CVPR42600.2020.01171.
    [14] XU Jingyi, LE H, and SAMARAS D. Generating features with increased crop-related diversity for few-shot object detection[C]. 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, Canada, 2023: 19713–19722. doi: 10.1109/CVPR52729.2023.01888.
    [15] HO J, JAIN A, and ABBEEL P. Denoising diffusion probabilistic models[C]. The 34th International Conference on Neural Information Processing Systems, Vancouver, Canada, 2020: 574.
    [16] HO J and SALIMANS T. Classifier-free diffusion guidance[EB/OL]. https://arxiv.org/abs/2207.12598, 2022.
    [17] QI Tianhao, FANG Shancheng, WU Yanze, et al. DEADiff: An efficient stylization diffusion model with disentangled representations[C]. 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, USA, 2024: 8693–8702. doi: 10.1109/CVPR52733.2024.00830.
    [18] GARBER T and TIRER T. Image restoration by denoising diffusion models with iteratively preconditioned guidance[C]. 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, USA, 2024: 25245–25254. doi: 10.1109/CVPR52733.2024.02385.
    [19] LI Muyang, CAI Tianle, CAO Jiaxin, et al. DistriFusion: Distributed parallel inference for high-resolution diffusion models[C]. 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, USA, 2024: 7183–7193. doi: 10.1109/CVPR52733.2024.00686.
    [20] HUANG Ziqi, CHAN K C K, JIANG Yuming, et al. Collaborative diffusion for multi-modal face generation and editing[C]. 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, Canada, 2023: 6080–6090. doi: 10.1109/CVPR52729.2023.00589.
    [21] RADFORD A, KIM J W, HALLACY C, et al. Learning transferable visual models from natural language supervision[C]. The 38th International Conference on Machine Learning, 2021: 8748–8763.
    [22] RONNEBERGER O, FISCHER P, and BROX T. U-net: Convolutional networks for biomedical image segmentation[C]. The 18th International Conference on Medical Image Computing and Computer-Assisted Intervention, Munich, Germany, 2015: 234–241. doi: 10.1007/978-3-319-24574-4_28.
    [23] WU Jiaxi, LIU Songtao, HUANG Di, et al. Multi-scale positive sample refinement for few-shot object detection[C]. The 16th European Conference on Computer Vision, Glasgow, UK, 2020: 456–472. doi: 10.1007/978-3-030-58517-4_27.
    [24] LI Aoxue and LI Zhenguo. Transformation invariant few-shot object detection[C]. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, USA, 2021: 3094–3102. doi: 10.1109/CVPR46437.2021.00311.
    [25] HU Hanzhe, BAI Shuai, LI Aoxue, et al. Dense relation distillation with context-aware aggregation for few-shot object detection[C]. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, USA, 2021: 10185–10194. doi: 10.1109/CVPR46437.2021.01005.
    [26] LI Bohao, YANG Boyu, LIU Chang, et al. Beyond max-margin: Class margin equilibrium for few-shot object detection[C]. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, USA, 2021: 7363–7372. doi: 10.1109/CVPR46437.2021.00728.
    [27] CAO Yuhang, WANG Jiaqi, JIN Ying, et al. Few-shot object detection via association and discrimination[C]. The 35th International Conference on Neural Information Processing Systems, 2021: 1267.
    [28] HAN Guangxing, MA Jiawei, HUANG Shiyuan, et al. Few-shot object detection with fully cross-transformer[C]. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, USA, 2022: 5321–5330. doi: 10.1109/CVPR52688.2022.00525.
    [29] ZHANG Gongjie, LUO Zhipeng, CUI Kaiwen, et al. Meta-DETR: Image-level few-shot detection with inter-class correlation exploitation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45(11): 12832–12843. doi: 10.1109/TPAMI.2022.3195735.
    [30] XIAO Yang, LEPETIT V, and MARLET R. Few-shot object detection and viewpoint estimation for objects in the wild[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45(3): 3090–3106. doi: 10.1109/TPAMI.2022.3174072.
  • 加载中
图(6) / 表(7)
计量
  • 文章访问数:  241
  • HTML全文浏览量:  114
  • PDF下载量:  56
  • 被引次数: 0
出版历程
  • 收稿日期:  2024-10-08
  • 修回日期:  2025-03-05
  • 网络出版日期:  2025-03-19
  • 刊出日期:  2025-04-10

目录

    /

    返回文章
    返回