基于动态聚焦与语义提示的细粒度害虫分类网络

柳长源; 赵海健; 吴海滨

doi:10.11999/JEIT260044

基于动态聚焦与语义提示的细粒度害虫分类网络

doi: 10.11999/JEIT260044 cstr: 32379.14.JEIT260044

哈尔滨理工大学测控技术与通信工程学院哈尔滨 150080

基金项目: 黑龙江省交通运输厅科技项目(HJK2024B002)

详细信息

作者简介:
柳长源：男，副教授，研究方向为图像处理与模式识别

赵海健：男，硕士生，研究方向为图像分类与图像识别

吴海滨：男，教授，研究方向为机器视觉与图像分类

通讯作者:
柳长源　liuchangyuan@hrbust.edu.cn

中图分类号: TN911.73; TP391.41
计量
- 文章访问数: 108
- HTML全文浏览量: 59
- PDF下载量: 8
- 被引次数: 0
出版历程
- 收稿日期: 2026-01-13
- 修回日期: 2026-04-13
- 录用日期: 2026-04-13
- 网络出版日期: 2026-04-30

Dynamic Focus and Semantic Prompt Network for Fine-Grained Pest Classification

College of Measurement and Control Technology and Communication Engineering, Harbin University of Science and Technology, Harbin 150080, China

Funds: Scientific and Technological Project of the Department of Transportation of Heilongjiang Province (HJK2024B002)

摘要

摘要: 农业害虫图像普遍存在复杂背景干扰、不同虫态时期外观差异显著、拍摄角度多样和尺度变化大等问题，导致现有细粒度图像分类模型在特征提取和虫态变化适应性方面仍存在不足。针对上述问题，该文构建了一个涵盖多虫态时期、多角度和多尺度的农业害虫多维数据集(APMD)，并提出一种基于动态聚焦与语义提示的细粒度害虫分类网络(DFS-PestNet)。该文构建主特征流与提示增强流相结合的解耦并行架构，通过空间形变建模模块(SDP)动态聚焦害虫斑点、翅脉等关键判别区域，以增强复杂背景下的局部细微特征提取能力；引入提示引导机制模块(AHVP)，在浅中层特征中融合类别语义与空间位置信息，提升对不同虫态形态变化的适应性；同时采用显著性双分支采样(DSS)，通过可学习原型部件和双分支显著性融合自适应聚合害虫关键部位特征，增强对微小害虫和早期幼虫等小目标的识别能力。实验结果表明，在IP102和APMD两个数据集上本文模型均取得了优于基线模型和现有主流方法的分类性能，验证了其在复杂场景下细粒度害虫分类任务中的有效性与应用潜力，为智慧农业中的虫害监测与精准防治提供技术参考。
- 农业害虫 /
- 细粒度分类 /
- 空间形变建模 /
- 提示引导机制 /
- 小目标
Abstract: Objective Agricultural pest images are often affected by complex background interference, large appearance differences across morphological stages, diverse shooting angles, and substantial scale variation. These factors limit feature extraction and morphological adaptability in existing fine-grained classification models. To address these challenges, an Agricultural Pest Multi-Dimensional Dataset (APMD) is constructed to cover multiple morphological stages, viewing angles, and object scales. In addition, a Dynamic Focus and Semantic Prompt Network for fine-grained pest classification (DFS-PestNet) is proposed. The network adopts a decoupled parallel architecture that combines a main feature stream and a prompt enhancement stream. A Spatial Dependency Perception (SDP) module is designed to dynamically focus on key discriminative regions, such as pest spots and wing veins, thereby improving local subtle feature extraction under complex backgrounds. An Advanced Haptic-Visual Prompting (AHVP) module is introduced to integrate category semantics and spatial position information into shallow and middle-level features, which improves adaptability to morphological variations across developmental stages. Dual-branch Saliency Sampling (DSS) is further adopted to adaptively aggregate key features from essential pest body parts through learnable prototype components and dual-branch saliency fusion. This strategy improves the recognition of small targets, including tiny pests and early-stage larvae. Experimental results show that the proposed model achieves better classification performance than baseline and mainstream methods on both public and self-constructed datasets. These results verify the effectiveness and application potential of the model in complex agricultural scenarios and provide a technical reference for intelligent pest monitoring and precise control in smart agriculture. Methods To improve classification accuracy under complex background interference and multi-morphological conditions, APMD is first constructed. This dataset contains image data covering different pest morphological stages, viewing angles, and scales. Specifically, it includes 15,680 images from 58 species, which are divided into training, validation, and testing sets at a standard ratio of 7:2:1 (Fig. 1) (Table 1). The dataset provides high-quality data support for research on fine-grained pest classification. DFS-PestNet is then proposed. In this network, the SDP module is designed to adaptively locate and enhance key discriminative pest regions. By reducing the effects of pose variation and complex background interference, this module enables more accurate fine-grained feature extraction. The AHVP module is also incorporated into the network to embed category semantics and spatial position information. This module guides the network to focus on key discriminative features across different morphological periods, thereby improving recognition robustness under large morphological changes during the pest life cycle. Furthermore, DSS is proposed to adaptively aggregate features from essential pest body parts. This strategy strengthens the recognition of challenging small targets and reduces the difficulty of small-target recognition in fine-grained pest classification. Results and Discussions The performance of DFS-PestNet in fine-grained pest classification is evaluated through multidimensional experiments. First, qualitative visualization is conducted. Grad-CAM heatmaps show that, compared with the baseline model, which is easily affected by complex farmland backgrounds and plant stems, DFS-PestNet effectively suppresses background noise and focuses on fine-grained discriminative parts, such as pest heads and antennae (Fig. 6). The model also shows clear advantages in capturing features of tiny targets, such as leafhopper nymphs, and pests at different life stages, such as Chilo suppressalis hidden within stems. The t-SNE feature reduction results further confirm that the proposed model reduces feature confusion in multi-morphological scenarios. High-dimensional features show clearer inter-class separation and tighter intra-class clustering in a two-dimensional visual space (Fig. 7). Second, quantitative ablation and parameter optimization experiments are performed. The ablation studies validate the synergistic effect of the three improved modules, namely SDP, AHVP, and DSS (Table 2). Their combination increases the classification accuracy of the baseline model by 2.21%, reaching 77.24%, with all core evaluation metrics achieving the best values. Hyperparameter optimization further identifies 6 as the optimal number of prompt position tokens and 0.2 as the optimal feature dropout rate (Fig. 8). This configuration ensures sufficient semantic representation while achieving a good balance between simulating natural occlusion and improving model robustness. Finally, comparative experiments with mainstream state-of-the-art models are conducted. Compared with existing advanced Convolutional Neural Network (CNN) and Transformer architectures, such as Gate-ViT and EST, DFS-PestNet achieves the highest accuracies of 77.24% and 98.01% on the large-scale public dataset IP102 and the challenging self-constructed APMD dataset, respectively (Table 3) (Table 4). These results show consistent improvements across fine-grained classification metrics. Moreover, while maintaining high classification accuracy, the proposed model achieves inference speeds of 158 frames/s and 164 frames/s on the two datasets, respectively. In summary, DFS-PestNet achieves strong classification accuracy and high inference efficiency for complex pest feature extraction across large scale variation and multiple morphological stages. This provides a practical basis for efficient deployment in smart agriculture. Conclusions To address multi-morphological variation and small-target recognition in fine-grained pest classification, the APMD dataset is constructed, and DFS-PestNet is proposed based on the MPSA baseline. Specifically, the SDP module is introduced to adaptively focus on pose- and morphology-invariant discriminative features. The AHVP module embeds category semantics and spatial position information into shallow and middle-level networks. The DSS module adaptively aggregates key body-part features to improve small-target recognition. Experimental results show that DFS-PestNet outperforms mainstream models on both the IP102 and APMD datasets across different developmental stages, angles, and scales. Future work will focus on lightweight model design for efficient edge deployment and open-set recognition for early warning of unknown pest categories in complex real-world environments.
- Agricultural pest /
- Fine-grained classification /
- Spatial Dependency Perception(SDP) /
- Advanced Haptic-Visual Prompting(AHVP) /
- Small target

HTML全文

图 1 APMD数据集图像

下载: 全尺寸图片幻灯片

图 2 DFS-PestNet模型结构

下载: 全尺寸图片幻灯片

图 3 SDP模块结构

下载: 全尺寸图片幻灯片

图 4 AHVP模块结构

下载: 全尺寸图片幻灯片

图 5 DSS模块结构

下载: 全尺寸图片幻灯片

图 6 Grad-CAM可视化对比

下载: 全尺寸图片幻灯片

图 7 t-SNE可视化APMD数据集特征分布图

下载: 全尺寸图片幻灯片

图 8 模型超参数变化下的准确率大小

下载: 全尺寸图片幻灯片

表 1 APMD数据集层级化分类体系

目	种类数	虫态数	图像数量
目	种类数	虫态数	训练集	验证集	测试集
半翅目	23	3(卵/幼/成虫)	4347	1242	621
鳞翅目	16	4(卵/幼/蛹/成虫)	3024	864	432
鞘翅目	10	4(卵/幼/蛹/成虫)	1890	540	270
双翅目	3	4(卵/幼/蛹/成虫)	567	162	81
膜翅目	2	4(卵/幼/蛹/成虫)	378	108	54
蜱螨类	2	4(卵/幼/蛹/成虫)	378	108	54
直翅目	1	3(卵/幼/成虫)	189	54	27
等翅目	1	3(卵/幼/成虫)	189	54	27
合计	58		10962	3132	1566

下载: 导出CSV

表 2 消融实验结果(%)

实验编号	SDP 模块	AHVP 模块	DSS 模块	A	P	R	F1-score
1				75.03	67.85	64.60	66.18
2	√			75.73	69.10	65.90	67.46
3		√		75.67	68.95	65.85	67.36
4			√	75.38	68.40	65.30	66.81
5	√	√		76.52	70.40	67.45	68.89
6	√		√	76.49	70.35	67.30	68.79
7		√	√	76.50	70.38	67.40	68.86
8	√	√	√	77.24	71.64	68.78	70.18

下载: 导出CSV

表 3 IP102数据集上对比试验结果

网络模型	Backbone	A (%)	P (%)	R (%)	F1-score (%)	FPS
VRFNet^[18]	EfficientNet	68.34	68.37	68.33	68.34	-
EfficientNet B7^[19]	EfficientNet	70.01	-	-	-	-
ViT^[20]	ViT-B/16	73.40	68.17	66.89	66.52	70
IELT^[21]	ViT-B/16	75.40	69.50	67.82	68.65	62
FFEL-Net^[22]	ViT-B/16	76.21	68.44	66.86	67.64	-
FRCF^[23]	ViT-B/16	74.69	-	-	-	91
Gate-ViT^[24]	ViT-B/16	76.10	70.11	69.14	69.62	109
EST^[25]	Swin-B	71.84	65.85	63.22	64.06	84
本文模型	Swin-B	77.24	71.64	68.78	70.18	158

下载: 导出CSV

表 4 APMD数据集上对比试验结果

网络模型	Backbone	A(%)	P(%)	R(%)	F1-score(%)	FPS
ViT^[20]	ViT-B/16	88.92	89.80	87.45	88.61	125
IELT^[21]	ViT-B/16	90.24	90.90	89.21	90.05	117
Gate-ViT^[24]	ViT-B/16	91.60	92.10	90.87	91.48	112
CLCA^[26]	ViT-B/16	93.52	94.00	92.79	93.39	110
GLSim^[27]	ViT-B/16	95.78	96.10	95.32	95.71	105
EST^[25]	Swin-B	93.45	93.82	93.10	93.46	138
本文模型	Swin-B	98.01	98.20	98.00	97.90	164

下载: 导出CSV

参考文献(27)

[1]	陆宴辉, 刘杨, 杨现明, 等. 中国农业害虫综合防治研究进展: 2018年-2022年[J]. 植物保护, 2023, 49(5): 145–166. doi: 10.16688/j.zwbh.2023207. LU Yanhui, LIU Yang, YANG Xianming, et al. Advances in integrated management of agricultural insect pests in China: 2018-2022[J]. Plant Protection, 2023, 49(5): 145–166. doi: 10.16688/j.zwbh.2023207.
[2]	赵雪如, 李晖, 胡欣仪, 等. 基于深度学习的田间害虫自动识别技术综述[J]. 图像与信号处理, 2023, 12(2): 77–88. doi: 10.12677/JISP.2023.122008. ZHAO Xueru, LI Hui, HU Xinyi, et al. Survey of automatic identification of field pests based on deep learning[J]. Journal of Image and Signal Processing, 2023, 12(2): 77–88. doi: 10.12677/JISP.2023.122008.
[3]	WU Xiaoping, ZHAN Chi, LAI Yukun, et al. IP102: A large-scale benchmark dataset for insect pest recognition[C]. The 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, USA, 2019: 8779–8788. doi: 10.1109/CVPR.2019.00899.
[4]	陈磊, 刘立波, 王晓丽. 2020年宁夏枸杞虫害图文跨模态检索数据集[J]. 中国科学数据, 2022, 7(3): 1–8. doi: 10.11922/11-6035.nasdc.2021.0058.zh. CHEN Lei, LIU Libo, and WANG Xiaoli. A dataset of image-text cross-modal retrieval of Lycium barbarum pests in Ningxia in 2020[J]. China Scientific Data, 2022, 7(3): 1–8. doi: 10.11922/11-6035.nasdc.2021.0058.zh.
[5]	LI Yanfen, WANG Hanxiang, DANG L M, et al. Crop pest recognition in natural scenes using convolutional neural networks[J]. Computers and Electronics in Agriculture, 2020, 169: 105174. doi: 10.1016/j.compag.2019.105174.
[6]	BOLLIS E, PEDRINI H, and AVILA S. Weakly supervised learning guided by activation mapping applied to a novel citrus pest benchmark[C]. The IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Seattle, USA, 2020: 310–319. doi: 10.1109/CVPRW50498.2020.00043.
[7]	FANG Mingwei, TAN Zhiping, TANG Yu, et al. Pest-ConFormer: A hybrid CNN-Transformer architecture for large-scale multi-class crop pest recognition[J]. Expert Systems with Applications, 2024, 255: 124833. doi: 10.1016/j.eswa.2024.124833.
[8]	CHENG Zekai and XIA Wan. Fine-grained image classification on agricultural pest larvae[J]. IOP Conference Series: Earth and Environmental Science, 2021, 792: 012037. doi: 10.1088/1755-1315/792/1/012037.
[9]	AMARATHUNGA D C, RATNAYAKE M N, GRUNDY J, et al. Fine-grained image classification of microscopic insect pest species: Western Flower thrips and Plague thrips[J]. Computers and Electronics in Agriculture, 2022, 203: 107462. doi: 10.1016/j.compag.2022.107462.
[10]	WANG Linfeng, LIU Yong, LI Jiayao, et al. Based on the multi-scale information sharing network of fine-grained attention for agricultural pest detection[J]. PLoS One, 2023, 18(10): e0286732. doi: 10.1371/journal.pone.0286732.
[11]	赵凤, 耿苗苗, 刘汉强, 等. 卷积神经网络与视觉Transformer联合驱动的跨层多尺度融合网络高光谱图像分类方法[J]. 电子与信息学报, 2024, 46(5): 2237–2248. doi: 10.11999/JEIT231209. ZHAO Feng, GENG Miaomiao, LIU Hanqiang, et al. Convolutional neural network and vision Transformer-driven cross-layer multi-scale fusion network for hyperspectral image classification[J]. Journal of Electronics & Information Technology, 2024, 46(5): 2237–2248. doi: 10.11999/JEIT231209.
[12]	文泓力, 胡庆浩, 黄立威, 等. 基于参数高效ViT与多模态导引的遥感图像小样本分类方法[J]. 电子与信息学报, 2025, 47(12): 4689–4703. doi: 10.11999/JEIT250996. WEN Hongli, HU Qinghao, HUANG Liwei, et al. Few-shot remote sensing image classification based on parameter-efficient vision transformer and multimodal guidance[J]. Journal of Electronics & Information Technology, 2025, 47(12): 4689–4703. doi: 10.11999/JEIT250996.
[13]	宋婉莹, 刘毓琛, 王杰, 等. 面向高分辨遥感图像的熵驱动自适应融合网络构建与场景分类研究[J]. 电子与信息学报, 2025. doi: 10.11999/JEIT251147. SONG Wanying, LIU Yuchen, WANG Jie, et al. Entropy-driven adaptive fusion network for scene classification of high-resolution remote sensing images[J]. Journal of Electronics & Information Technology, 2025. doi: 10.11999/JEIT251147.
[14]	HAN Yuantao, ZHANG Cong, ZHAN Xiaoyun, et al. Crossing multiple life stages: Fine-grained classification of agricultural pests[J]. Plant Methods, 2024, 20(1): 191. doi: 10.1186/s13007-024-01317-w.
[15]	WANG Jiahui, XU Qin, JIANG Bo, et al. Multi-granularity part sampling attention for fine-grained visual classification[J]. IEEE Transactions on Image Processing, 2024, 33: 4529–4542. doi: 10.1109/TIP.2024.3441813.
[16]	DAI Jifeng, QI Haozhi, XIONG Yuwen, et al. Deformable convolutional networks[C]. The 2017 IEEE International Conference on Computer Vision, Venice, Italy, 2017: 764–773. doi: 10.1109/ICCV.2017.89.
[17]	KIRKLAND E J. Advanced Computing in Electron Microscopy[M]. 2nd ed. New York: Springer, 2010: 261–263. doi: 10.1007/978-1-4419-6533-2.
[18]	NANDHINI C and BRINDHA M. Visual regenerative fusion network for pest recognition[J]. Neural Computing and Applications, 2024, 36(6): 2867–2882. doi: 10.1007/s00521-023-09173-w.
[19]	MALIK P and PARIDA M K. Classification of insect pest using transfer learning mechanism[C]. The 8th International Conference on Computer Vision and Image Processing, Jammu, India, 2023: 78–89. doi: 10.1007/978-3-031-58535-7_7.
[20]	DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16x16 words: Transformers for image recognition at scale[C]. The 9th International Conference on Learning Representations, 2021.
[21]	XU Qin, WANG Jiahui, JIANG Bo, et al. Fine-grained visual classification via internal ensemble learning transformer[J]. IEEE Transactions on Multimedia, 2023, 25: 9015–9028. doi: 10.1109/TMM.2023.3244340.
[22]	张文丽, 宋威. 基于特征融合与集成学习的细粒度图像分类[J]. 激光与光电子学进展, 2024, 61(22): 2237010. doi: 10.3788/LOP240759. ZHANG Wenli and SONG Wei. Fine-grained image classification based on feature fusion and ensemble learning[J]. Laser & Optoelectronics Progress, 2024, 61(22): 2237010. doi: 10.3788/LOP240759.
[23]	LIU Honglin, ZHAN Yongzhao, XIA Huifen, et al. Self-supervised transformer-based pre-training method using latent semantic masking auto-encoder for pest and disease classification[J]. Computers and Electronics in Agriculture, 2022, 203: 107448. doi: 10.1016/j.compag.2022.107448.
[24]	LU Xiaowei, WANG Kanqi, WANG Peiyu, et al. Gate-ViT: Gated vision transformer for fine-grained visual classification[C]. The 29th Pacific-Asia Conference on Knowledge Discovery and Data Mining, Sydney, Australia, 2025: 468–479. doi: 10.1007/978-981-96-8180-8_37.
[25]	LIU Wei and ZHANG Ao. Plant disease detection algorithm based on efficient Swin transformer[J]. Computers, Materials & Continua, 2025, 82(2): 3045–3068. doi: 10.32604/cmc.2024.058640.
[26]	RIOS E A, YUANDA J C, GHANZ V L, et al. Cross-layer cache aggregation for token reduction in ultra-fine-grained image recognition[C]. 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Hyderabad, India, 2025: 1–5. doi: 10.1109/icassp49660.2025.10890489.
[27]	RIOS E A, HU Minchun, and LAI Bocheng. Global-local similarity for efficient fine-grained image recognition with vision transformers[C]. 2025 IEEE International Symposium on Circuits and Systems (ISCAS), London, UK, 2025: 1–5. doi: 10.1109/ISCAS56072.2025.11043866.