基于条件扩散模型样本生成的小样本目标检测

梅天灿; 王亚茹; 陈元豪

doi:10.11999/JEIT240841

基于条件扩散模型样本生成的小样本目标检测

doi: 10.11999/JEIT240841 cstr: 32379.14.JEIT240841

武汉大学电子信息学院武汉 430072

详细信息

作者简介:
梅天灿：男，副教授，研究方向为计算机视觉、模式识别、机器学习

王亚茹：女，硕士生，研究方向为计算机视觉、目标检测

陈元豪：男，硕士生，研究方向为深度学习、目标检测

通讯作者:
梅天灿　mtc@whu.edu.cn

中图分类号: TN911.73; TP391.41
计量
- 文章访问数: 679
- HTML全文浏览量: 512
- PDF下载量: 128
- 被引次数: 0
出版历程
- 收稿日期: 2024-10-08
- 修回日期: 2025-03-05
- 网络出版日期: 2025-03-19
- 刊出日期: 2025-04-01

Sample Generation Based on Conditional Diffusion Model for Few-Shot Object Detection

School of Electronic Information, Wuhan University, Wuhan 430072, China

摘要

摘要: 利用生成模型为小样本目标检测提供额外样本是解决样本稀缺问题的方法之一。现有生成额外样本的方法，多关注于生成样本的多样性，而忽略了生成样本的质量和代表性。为解决这一问题，该文提出了一个新的基于数据生成的小样本目标检测框架FQRS。首先，构造类间条件控制模块使得数据生成器能够学习不同类别间的关系，利用基类和新类的类间关系信息辅助模型估计新类的分布，从而提高生成样本的质量。其次，设计类内条件控制模块，利用交并比(IOU)信息限制生成样本在特征空间的位置，通过控制生成的样本更聚集于类别的中心，确保它们能够捕捉对应类别的关键特征，从而提高生成样本的代表性。在PASCAL VOC和MS COCO数据集上进行测试，在不同小样本条件下，该文提出的模型均超过当前最好的两阶段微调目标检测模型—解耦的更快区域卷积神经网络(DeFRCN)。实验验证了该文方法在小样本目标检测上具有出色的检测效果。
- 小样本目标检测 /
- 深度学习 /
- 数据增强 /
- 样本生成
Abstract: Objective Deep learning-based object detection typically requires a large volume of high-quality annotated samples, which limits its practical applicability. Few-Shot Object Detection (FSOD) has gained significant attention as a promising research area. FSOD leverages base classes with abundant labeled data to recognize novel classes with limited training samples. Several methods based on generative models have been proposed to address the challenge of limited annotated data in FSOD. However, some limitations remain. (1) Most generative models fail to sufficiently capture the relationships between base and novel classes, which hinders their ability to accurately estimate novel class distributions and degrades the quality of generated samples. (2) Existing methods often prioritize increasing sample diversity, neglecting the critical need for representativeness. Low representativeness can cause confusion between categories, potentially reducing detection performance. Since the quality of generated samples directly affects the performance of the object detection network, which trains using both original and generated samples, this issue must be addressed. To address these challenges, a novel framework for data generation in FSOD via additional high-Quality and Representative Samples (FQRS), is introduced. A conditional control module, incorporating both inter-class and intra-class dynamics, is introduced to improve the quality and representativeness of generated samples, ultimately enhancing the accuracy of FSOD. Methods The proposed model architecture consists of a fine-tuning-based object detector and a data generator. First, the object detector is trained using base class data. Then, the pre-trained detector is employed to extract Region of Interest (RoI) features, which are used as training data for the generator. The generator, once trained, generates new samples for the novel classes. The architecture of the data generator includes a diffusion model for sample generation and an inter-class and intra-class conditional control module to guide the diffusion process. For inter-class conditional control, a semantic relation embedding is introduced, using cosine similarity to represent the degree of correlation between different classes. This enables the data generator to learn inter-class relations effectively. The relations between base and novel classes assist the diffusion model in estimating novel class distributions, improving the quality of generated samples. For intra-class conditional control, Intersection Over Union (IOU) information is utilized to constrain the position of generated samples within the corresponding feature space. This ensures that generated samples cluster around their respective category centers, enhancing their representativeness and preserving important class characteristics. Finally, the object detector is fine-tuned using both the generated samples and the original training samples. A hyperparameter in the loss function is introduced to control the influence of generated samples on the object detector’s training process. Results and Discussions The effectiveness and robustness of the proposed network are validated on two public datasets: PASCAL VOC and MS COCO. Detection accuracy is evaluated using mAP and mAP50 metrics. Quantitative comparisons (Tables 1 and 2) show that the proposed network outperforms existing methods across both datasets. For example, on the MS COCO dataset under the 1-shot setting, the proposed method achieves a 16.9% improvement over the state-of-the-art DeFRCN approach. A cross-domain experiment (Table 3), where base and novel class data are sourced from different datasets, demonstrates the superior generalization capability of the proposed method. Visual comparisons (Fig. 5) highlight that the proposed method effectively addresses issues like missed detections and category confusion arising from limited training data, thus improving the performance of FSOD. Ablation studies (Tables 4, 5, and 6) confirm the efficacy of the proposed modules and reveal the impact of varying parameter configurations on detection performance. t-SNE visualization results (Fig. 6) show that the inter-class and intra-class conditional control module enhances feature aggregation within the same category, while improving discriminability between categories and reducing categorical confusion. Additionally, quantitative analysis (Table 7) examines the variations in model complexity introduced by the data generator, focusing on both parameter count and floating-point operations. Conclusions This paper presents a novel data-generation-based framework for obtaining additional samples in FSOD. The framework integrates a data generator, built on a conditional diffusion model, into a fine-tuning-based object detection network. The proposed data generator learns category features in conjunction with inter-class relations, capturing distinct category characteristics and improving generalization to novel classes. Additionally, the generator enhances sample representativeness by constraining generated samples to cluster around category centers. These high-quality, representative generated samples facilitate the object detector’s training, leading to improved FSOD accuracy. In various few-shot settings, the proposed model outperforms the state-of-the-art fine-tuning object detection model, Decoupled Faster Region-based Convolutional Neural net-work (DeFRCN), on both the PASCAL VOC and MS COCO datasets. Extensive experimental results validate the superiority of the proposed approach.
- Few-Shot Object Detection (FSOD) /
- Deep Learning (DL) /
- Data augmentation /
- Sample generation

HTML全文

图 1 类间关系示意图

下载: 全尺寸图片幻灯片

图 2 本文所提FQRS架构

下载: 全尺寸图片幻灯片

图 3 类间-类内控制模块结构

下载: 全尺寸图片幻灯片

图 4 IOU大小不同的RoI区域及特征对比

下载: 全尺寸图片幻灯片

图 5 不同方法检测结果对比图

下载: 全尺寸图片幻灯片

图 6 不同控制条件下生成的样本对比

下载: 全尺寸图片幻灯片

表 1 在PASCAL VOC数据集上本文方法与其他方法结果对比

方法/shot	Novel Set 1					Novel Set 2					Novel Set 3
方法/shot	1	2	3	5	10	1	2	3	5	10	1	2	3	5	10
TFA^[6]	39.8	36.1	44.7	55.7	56.0	23.5	26.9	34.1	35.1	39.1	30.8	34.8	42.8	49.5	49.8
MPSR^[23]	41.7	–	51.4	55.2	61.8	24.4	–	39.2	39.9	47.8	35.6	–	42.3	48	49.7
TIP^[24]	27.7	36.5	43.3	50.2	59.6	22.7	30.1	33.8	40.9	46.9	21.7	30.6	38.1	44.5	50.9
DCNet^[25]	33.9	37.4	43.7	51.1	59.6	23.2	24.8	30.6	36.7	46.6	32.3	34.9	39.7	42.6	50.7
CME^[26]	41.5	47.5	50.4	58.2	60.9	27.2	30.2	41.4	42.5	46.8	34.3	39.6	45.1	48.3	51.5
SRR-FSD^[10]	47.8	50.5	51.3	55.2	56.8	32.5	35.3	39.1	40.8	43.8	40.1	41.5	44.3	46.9	46.4
FADI^[27]	50.3	54.8	54.2	59.3	63.2	30.6	35.0	40.3	42.8	48.0	45.7	49.7	49.1	55.0	59.6
DeFRCN^[9]	45.7	56.4	59.3	62.6	64.6	35.7	40.5	45.3	50.4	54.1	39.8	50.6	52.8	56.1	60.8
FCT^[28]	38.5	49.6	53.5	59.8	64.3	25.9	34.2	40.1	44.9	47.4	34.7	43.9	49.3	53.1	56.3
Meta-DETR^[29]	35.1	49.0	53.2	57.4	62.0	27.9	32.3	38.4	43.2	51.8	34.9	41.8	47.1	54.1	58.2
本文方法	47.8	56.6	59.3	63.2	65.6	37.5	43.1	47.5	52.0	56.3	40.2	50.8	53.8	56.6	62.4

下载: 导出CSV

表 2 在MS COCO数据集上本文方法与其他方法结果对比

方法	1-shot	2-shot	3-shot	5-shot	10-shot	30-shot
TFA^[6]	4.4	5.4	6	7.7	10.0	13.7
MPSR^[23]	5.1	6.7	7.4	8.7	9.8	14.1
FSDetView^[30]	4.5	6.6	7.2	10.7	12.5	14.7
TIP^[24]	–	–	–	–	16.3	18.3
DCNet^[25]	–	–	–	–	12.8	18.6
CME^[26]	–	–	–	–	15.1	16.9
SRR-FSD^[10]	–	–	–	–	11.3	14.7
FADI^[27]	5.7	7	8.6	10.1	12.2	16.1
DeFRCN^[9]	7.7	11.4	13.2	15.5	18.5	22.4
FCT^[28]	5.1	7.2	9.8	12.0	15.3	20.2
Meta-DETR^[29]	7.5	–	13.5	15.4	19.0	22.2
本文方法	9.0	12.2	13.6	15.7	18.6	22.6

下载: 导出CSV

表 3 本文方法与其他方法跨域实验结果对比

方法	mAP (%)
MetaRCNN^[8]	37.4
MPSR^[23]	42.3
DeFRCN^[9]	55.9
本文方法	57.4

下载: 导出CSV

表 4 类间-类内控制模块有效性验证

类间条件控制		类内条件控制	mAP (%)
语义嵌入	语义关系嵌入	类内条件控制	mAP (%)
√			8.3
√	√		8.8
√		√	8.6
√	√	√	9.0

下载: 导出CSV

表 5 不同参数取值结果对比

I的取值范围	$ \gamma $	mAP (%)
[0.5,1]	0.4	8.7
[0.5,0.75]	0.4	8.2
[0.75,1]	0.3	8.8
[0.75,1]	0.5	8.7
[0.75,1]	0.4	9.0

下载: 导出CSV

表 6 参数T不同取值结果对比

T	mAP (%)
900	8.9
1 000	9.0
1 100	9.0

下载: 导出CSV

表 7 模型参数量与浮点运算次数对比

方法	#Param.(M)	FLOPs(G)
TFA^[6]	60.5	179.0
DeFRCN^[9] (本文目标检测器)	100.9	318.2
本文数据生成器	34.6	32.5

下载: 导出CSV

参考文献(30)

[1]	LIU Qiankun, LIU Rui, ZHENG Bolun, et al. Infrared small target detection with scale and location sensitivity[C]. 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, USA, 2024: 17490–17499. doi: 10.1109/CVPR52733.2024.01656.
[2]	ZHANG Gang, CHEN Junnan, GAO Guohuan, et al. SAFDNet: A simple and effective network for fully sparse 3D object detection[C]. 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, USA, 2024: 14477–14486. doi: 10.1109/CVPR52733.2024.01372.
[3]	YE Mingqiao, KE Lei, LI Siyuan, et al. Cascade-DETR: Delving into high-quality universal object detection[C]. 2023 IEEE/CVF International Conference on Computer Vision, Paris, France, 2023: 6704–6714. doi: 10.1109/ICCV51070.2023.00617.
[4]	WANG C Y, BOCHKOVSKIY A, and LIAO H Y M. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors[C]. 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, Canada, 2023: 7464–7475. doi: 10.1109/CVPR52729.2023.00721.
[5]	WANG Yuxiong, RAMANAN D, and HEBERT M. Meta-learning to detect rare objects[C]. 2019 IEEE/CVF International Conference on Computer Vision, Seoul, Korea (South), 2019: 9925–9934. doi: 10.1109/ICCV.2019.01002.
[6]	WANG Xin, HUANG T E, DARRELL T, et al. Frustratingly simple few-shot object detection[C]. The 37th International Conference on Machine Learning, 2020: 920.
[7]	SUN Bo, LI Banghuai, CAI Shengcai, et al. FSCE: Few-shot object detection via contrastive proposal encoding[C]. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, USA, 2021: 7352–7362. doi: 10.1109/CVPR46437.2021.00727.
[8]	YAN Xiaopeng, CHEN Ziliang, XU Anni, et al. Meta R-CNN: Towards general solver for instance-level low-shot learning[C]. 2019 IEEE/CVF International Conference on Computer Vision, Seoul, Korea (South), 2019: 9577–9586. doi: 10.1109/ICCV.2019.00967.
[9]	QIAO Limeng, ZHAO Yuxuan, LI Zhiyuan, et al. DeFRCN: Decoupled faster R-CNN for few-shot object detection[C]. 2021 IEEE/CVF International Conference on Computer Vision, Montreal, Canada, 2021: 8681–8690. doi: 10.1109/ICCV48922.2021.00856.
[10]	ZHU Chenchen, CHEN Fangyi, AHMED U, et al. Semantic relation reasoning for shot-stable few-shot object detection[C]. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, USA, 2021: 8782–8791. doi: 10.1109/CVPR46437.2021.00867.
[11]	REN Shaoqing, HE Kaiming, GIRSHICK R, et al. Faster R-CNN: Towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6): 1137–1149. doi: 10.1109/TPAMI.2016.2577031.
[12]	ZHANG Weilin and WANG Yuxiong. Hallucination improves few-shot object detection[C]. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, USA, 2021: 13008–13017. doi: 10.1109/CVPR46437.2021.01281.
[13]	ZHU Pengkai, WANG Hanxiao, and SALIGRAMA V. Don’t even look once: Synthesizing features for zero-shot detection[C]. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, USA, 2020: 11693–11702. doi: 10.1109/CVPR42600.2020.01171.
[14]	XU Jingyi, LE H, and SAMARAS D. Generating features with increased crop-related diversity for few-shot object detection[C]. 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, Canada, 2023: 19713–19722. doi: 10.1109/CVPR52729.2023.01888.
[15]	HO J, JAIN A, and ABBEEL P. Denoising diffusion probabilistic models[C]. The 34th International Conference on Neural Information Processing Systems, Vancouver, Canada, 2020: 574.
[16]	HO J and SALIMANS T. Classifier-free diffusion guidance[EB/OL]. https://arxiv.org/abs/2207.12598, 2022.
[17]	QI Tianhao, FANG Shancheng, WU Yanze, et al. DEADiff: An efficient stylization diffusion model with disentangled representations[C]. 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, USA, 2024: 8693–8702. doi: 10.1109/CVPR52733.2024.00830.
[18]	GARBER T and TIRER T. Image restoration by denoising diffusion models with iteratively preconditioned guidance[C]. 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, USA, 2024: 25245–25254. doi: 10.1109/CVPR52733.2024.02385.
[19]	LI Muyang, CAI Tianle, CAO Jiaxin, et al. DistriFusion: Distributed parallel inference for high-resolution diffusion models[C]. 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, USA, 2024: 7183–7193. doi: 10.1109/CVPR52733.2024.00686.
[20]	HUANG Ziqi, CHAN K C K, JIANG Yuming, et al. Collaborative diffusion for multi-modal face generation and editing[C]. 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, Canada, 2023: 6080–6090. doi: 10.1109/CVPR52729.2023.00589.
[21]	RADFORD A, KIM J W, HALLACY C, et al. Learning transferable visual models from natural language supervision[C]. The 38th International Conference on Machine Learning, 2021: 8748–8763.
[22]	RONNEBERGER O, FISCHER P, and BROX T. U-net: Convolutional networks for biomedical image segmentation[C]. The 18th International Conference on Medical Image Computing and Computer-Assisted Intervention, Munich, Germany, 2015: 234–241. doi: 10.1007/978-3-319-24574-4_28.
[23]	WU Jiaxi, LIU Songtao, HUANG Di, et al. Multi-scale positive sample refinement for few-shot object detection[C]. The 16th European Conference on Computer Vision, Glasgow, UK, 2020: 456–472. doi: 10.1007/978-3-030-58517-4_27.
[24]	LI Aoxue and LI Zhenguo. Transformation invariant few-shot object detection[C]. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, USA, 2021: 3094–3102. doi: 10.1109/CVPR46437.2021.00311.
[25]	HU Hanzhe, BAI Shuai, LI Aoxue, et al. Dense relation distillation with context-aware aggregation for few-shot object detection[C]. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, USA, 2021: 10185–10194. doi: 10.1109/CVPR46437.2021.01005.
[26]	LI Bohao, YANG Boyu, LIU Chang, et al. Beyond max-margin: Class margin equilibrium for few-shot object detection[C]. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, USA, 2021: 7363–7372. doi: 10.1109/CVPR46437.2021.00728.
[27]	CAO Yuhang, WANG Jiaqi, JIN Ying, et al. Few-shot object detection via association and discrimination[C]. The 35th International Conference on Neural Information Processing Systems, 2021: 1267.
[28]	HAN Guangxing, MA Jiawei, HUANG Shiyuan, et al. Few-shot object detection with fully cross-transformer[C]. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, USA, 2022: 5321–5330. doi: 10.1109/CVPR52688.2022.00525.
[29]	ZHANG Gongjie, LUO Zhipeng, CUI Kaiwen, et al. Meta-DETR: Image-level few-shot detection with inter-class correlation exploitation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45(11): 12832–12843. doi: 10.1109/TPAMI.2022.3195735.
[30]	XIAO Yang, LEPETIT V, and MARLET R. Few-shot object detection and viewpoint estimation for objects in the wild[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45(3): 3090–3106. doi: 10.1109/TPAMI.2022.3174072.

施引文献

资源附件(0)

访问统计

图(6) / 表(7)

计量

文章访问数: 679
HTML全文浏览量: 512
PDF下载量: 128
被引次数: 0

姓名
邮箱
手机号码
标题
留言内容
验证码

留言板

基于条件扩散模型样本生成的小样本目标检测

doi: 10.11999/JEIT240841 cstr: 32379.14.JEIT240841

作者简介:
梅天灿：男，副教授，研究方向为计算机视觉、模式识别、机器学习

王亚茹：女，硕士生，研究方向为计算机视觉、目标检测

陈元豪：男，硕士生，研究方向为深度学习、目标检测

通讯作者:
梅天灿　mtc@whu.edu.cn

计量

Sample Generation Based on Conditional Diffusion Model for Few-Shot Object Detection

计量

目录

留言板

基于条件扩散模型样本生成的小样本目标检测

doi: 10.11999/JEIT240841 cstr: 32379.14.JEIT240841

作者简介: 梅天灿：男，副教授，研究方向为计算机视觉、模式识别、机器学习 王亚茹：女，硕士生，研究方向为计算机视觉、目标检测 陈元豪：男，硕士生，研究方向为深度学习、目标检测

通讯作者: 梅天灿 mtc@whu.edu.cn

计量

出版历程

Sample Generation Based on Conditional Diffusion Model for Few-Shot Object Detection

计量

出版历程

目录

作者简介:
梅天灿：男，副教授，研究方向为计算机视觉、模式识别、机器学习

王亚茹：女，硕士生，研究方向为计算机视觉、目标检测

陈元豪：男，硕士生，研究方向为深度学习、目标检测

通讯作者:
梅天灿　mtc@whu.edu.cn