Dynamic Focus and Semantic Prompt Network for Fine-Grained Pest Classification
-
摘要: 农业害虫图像普遍存在复杂背景干扰、不同虫态时期外观差异显著、拍摄角度多样和尺度变化大等问题,导致现有细粒度图像分类模型在特征提取和虫态变化适应性方面仍存在不足。针对上述问题,该文构建了一个涵盖多虫态时期、多角度和多尺度的农业害虫多维数据集(APMD),并提出一种基于动态聚焦与语义提示的细粒度害虫分类网络(DFS-PestNet)。该文构建主特征流与提示增强流相结合的解耦并行架构,通过空间形变建模模块(SDP)动态聚焦害虫斑点、翅脉等关键判别区域,以增强复杂背景下的局部细微特征提取能力;引入提示引导机制模块(AHVP),在浅中层特征中融合类别语义与空间位置信息,提升对不同虫态形态变化的适应性;同时采用显著性双分支采样(DSS),通过可学习原型部件和双分支显著性融合自适应聚合害虫关键部位特征,增强对微小害虫和早期幼虫等小目标的识别能力。实验结果表明,在IP102和APMD两个数据集上本文模型均取得了优于基线模型和现有主流方法的分类性能,验证了其在复杂场景下细粒度害虫分类任务中的有效性与应用潜力,为智慧农业中的虫害监测与精准防治提供技术参考。Abstract:
Objective Agricultural pest images are often affected by complex background interference, large appearance differences across morphological stages, diverse shooting angles, and substantial scale variation. These factors limit feature extraction and morphological adaptability in existing fine-grained classification models. To address these challenges, an Agricultural Pest Multi-Dimensional Dataset (APMD) is constructed to cover multiple morphological stages, viewing angles, and object scales. In addition, a Dynamic Focus and Semantic Prompt Network for fine-grained pest classification (DFS-PestNet) is proposed. The network adopts a decoupled parallel architecture that combines a main feature stream and a prompt enhancement stream. A Spatial Dependency Perception (SDP) module is designed to dynamically focus on key discriminative regions, such as pest spots and wing veins, thereby improving local subtle feature extraction under complex backgrounds. An Advanced Haptic-Visual Prompting (AHVP) module is introduced to integrate category semantics and spatial position information into shallow and middle-level features, which improves adaptability to morphological variations across developmental stages. Dual-branch Saliency Sampling (DSS) is further adopted to adaptively aggregate key features from essential pest body parts through learnable prototype components and dual-branch saliency fusion. This strategy improves the recognition of small targets, including tiny pests and early-stage larvae. Experimental results show that the proposed model achieves better classification performance than baseline and mainstream methods on both public and self-constructed datasets. These results verify the effectiveness and application potential of the model in complex agricultural scenarios and provide a technical reference for intelligent pest monitoring and precise control in smart agriculture. Methods To improve classification accuracy under complex background interference and multi-morphological conditions, APMD is first constructed. This dataset contains image data covering different pest morphological stages, viewing angles, and scales. Specifically, it includes 15,680 images from 58 species, which are divided into training, validation, and testing sets at a standard ratio of 7:2:1 ( Fig. 1 ) (Table 1 ). The dataset provides high-quality data support for research on fine-grained pest classification. DFS-PestNet is then proposed. In this network, the SDP module is designed to adaptively locate and enhance key discriminative pest regions. By reducing the effects of pose variation and complex background interference, this module enables more accurate fine-grained feature extraction. The AHVP module is also incorporated into the network to embed category semantics and spatial position information. This module guides the network to focus on key discriminative features across different morphological periods, thereby improving recognition robustness under large morphological changes during the pest life cycle. Furthermore, DSS is proposed to adaptively aggregate features from essential pest body parts. This strategy strengthens the recognition of challenging small targets and reduces the difficulty of small-target recognition in fine-grained pest classification.Results and Discussions The performance of DFS-PestNet in fine-grained pest classification is evaluated through multidimensional experiments. First, qualitative visualization is conducted. Grad-CAM heatmaps show that, compared with the baseline model, which is easily affected by complex farmland backgrounds and plant stems, DFS-PestNet effectively suppresses background noise and focuses on fine-grained discriminative parts, such as pest heads and antennae ( Fig. 6 ). The model also shows clear advantages in capturing features of tiny targets, such as leafhopper nymphs, and pests at different life stages, such as Chilo suppressalis hidden within stems. The t-SNE feature reduction results further confirm that the proposed model reduces feature confusion in multi-morphological scenarios. High-dimensional features show clearer inter-class separation and tighter intra-class clustering in a two-dimensional visual space (Fig. 7 ). Second, quantitative ablation and parameter optimization experiments are performed. The ablation studies validate the synergistic effect of the three improved modules, namely SDP, AHVP, and DSS (Table 2 ). Their combination increases the classification accuracy of the baseline model by 2.21%, reaching 77.24%, with all core evaluation metrics achieving the best values. Hyperparameter optimization further identifies 6 as the optimal number of prompt position tokens and 0.2 as the optimal feature dropout rate (Fig. 8 ). This configuration ensures sufficient semantic representation while achieving a good balance between simulating natural occlusion and improving model robustness. Finally, comparative experiments with mainstream state-of-the-art models are conducted. Compared with existing advanced Convolutional Neural Network (CNN) and Transformer architectures, such as Gate-ViT and EST, DFS-PestNet achieves the highest accuracies of 77.24% and 98.01% on the large-scale public dataset IP102 and the challenging self-constructed APMD dataset, respectively (Table 3 ) (Table 4 ). These results show consistent improvements across fine-grained classification metrics. Moreover, while maintaining high classification accuracy, the proposed model achieves inference speeds of 158 frames/s and 164 frames/s on the two datasets, respectively. In summary, DFS-PestNet achieves strong classification accuracy and high inference efficiency for complex pest feature extraction across large scale variation and multiple morphological stages. This provides a practical basis for efficient deployment in smart agriculture.Conclusions To address multi-morphological variation and small-target recognition in fine-grained pest classification, the APMD dataset is constructed, and DFS-PestNet is proposed based on the MPSA baseline. Specifically, the SDP module is introduced to adaptively focus on pose- and morphology-invariant discriminative features. The AHVP module embeds category semantics and spatial position information into shallow and middle-level networks. The DSS module adaptively aggregates key body-part features to improve small-target recognition. Experimental results show that DFS-PestNet outperforms mainstream models on both the IP102 and APMD datasets across different developmental stages, angles, and scales. Future work will focus on lightweight model design for efficient edge deployment and open-set recognition for early warning of unknown pest categories in complex real-world environments. -
表 1 APMD数据集层级化分类体系
目 种类数 虫态数 图像数量 训练集 验证集 测试集 半翅目 23 3(卵/幼/成虫) 4347 1242 621 鳞翅目 16 4(卵/幼/蛹/成虫) 3024 864 432 鞘翅目 10 4(卵/幼/蛹/成虫) 1890 540 270 双翅目 3 4(卵/幼/蛹/成虫) 567 162 81 膜翅目 2 4(卵/幼/蛹/成虫) 378 108 54 蜱螨类 2 4(卵/幼/蛹/成虫) 378 108 54 直翅目 1 3(卵/幼/成虫) 189 54 27 等翅目 1 3(卵/幼/成虫) 189 54 27 合计 58 10962 3132 1566 表 2 消融实验结果(%)
实验
编号SDP
模块AHVP
模块DSS
模块A P R F1-score 1 75.03 67.85 64.60 66.18 2 √ 75.73 69.10 65.90 67.46 3 √ 75.67 68.95 65.85 67.36 4 √ 75.38 68.40 65.30 66.81 5 √ √ 76.52 70.40 67.45 68.89 6 √ √ 76.49 70.35 67.30 68.79 7 √ √ 76.50 70.38 67.40 68.86 8 √ √ √ 77.24 71.64 68.78 70.18 表 3 IP102数据集上对比试验结果
网络模型 Backbone A (%) P (%) R (%) F1-score (%) FPS VRFNet[18] EfficientNet 68.34 68.37 68.33 68.34 - EfficientNet B7[19] EfficientNet 70.01 - - - - ViT[20] ViT-B/16 73.40 68.17 66.89 66.52 70 IELT[21] ViT-B/16 75.40 69.50 67.82 68.65 62 FFEL-Net[22] ViT-B/16 76.21 68.44 66.86 67.64 - FRCF[23] ViT-B/16 74.69 - - - 91 Gate-ViT[24] ViT-B/16 76.10 70.11 69.14 69.62 109 EST[25] Swin-B 71.84 65.85 63.22 64.06 84 本文模型 Swin-B 77.24 71.64 68.78 70.18 158 表 4 APMD数据集上对比试验结果
网络模型 Backbone A(%) P(%) R(%) F1-score(%) FPS ViT[20] ViT-B/16 88.92 89.80 87.45 88.61 125 IELT[21] ViT-B/16 90.24 90.90 89.21 90.05 117 Gate-ViT[24] ViT-B/16 91.60 92.10 90.87 91.48 112 CLCA[26] ViT-B/16 93.52 94.00 92.79 93.39 110 GLSim[27] ViT-B/16 95.78 96.10 95.32 95.71 105 EST[25] Swin-B 93.45 93.82 93.10 93.46 138 本文模型 Swin-B 98.01 98.20 98.00 97.90 164 -
[1] 陆宴辉, 刘杨, 杨现明, 等. 中国农业害虫综合防治研究进展: 2018年-2022年[J]. 植物保护, 2023, 49(5): 145–166. doi: 10.16688/j.zwbh.2023207.LU Yanhui, LIU Yang, YANG Xianming, et al. Advances in integrated management of agricultural insect pests in China: 2018-2022[J]. Plant Protection, 2023, 49(5): 145–166. doi: 10.16688/j.zwbh.2023207. [2] 赵雪如, 李晖, 胡欣仪, 等. 基于深度学习的田间害虫自动识别技术综述[J]. 图像与信号处理, 2023, 12(2): 77–88. doi: 10.12677/JISP.2023.122008.ZHAO Xueru, LI Hui, HU Xinyi, et al. Survey of automatic identification of field pests based on deep learning[J]. Journal of Image and Signal Processing, 2023, 12(2): 77–88. doi: 10.12677/JISP.2023.122008. [3] WU Xiaoping, ZHAN Chi, LAI Yukun, et al. IP102: A large-scale benchmark dataset for insect pest recognition[C]. The 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, USA, 2019: 8779–8788. doi: 10.1109/CVPR.2019.00899. [4] 陈磊, 刘立波, 王晓丽. 2020年宁夏枸杞虫害图文跨模态检索数据集[J]. 中国科学数据, 2022, 7(3): 1–8. doi: 10.11922/11-6035.nasdc.2021.0058.zh.CHEN Lei, LIU Libo, and WANG Xiaoli. A dataset of image-text cross-modal retrieval of Lycium barbarum pests in Ningxia in 2020[J]. China Scientific Data, 2022, 7(3): 1–8. doi: 10.11922/11-6035.nasdc.2021.0058.zh. [5] LI Yanfen, WANG Hanxiang, DANG L M, et al. Crop pest recognition in natural scenes using convolutional neural networks[J]. Computers and Electronics in Agriculture, 2020, 169: 105174. doi: 10.1016/j.compag.2019.105174. [6] BOLLIS E, PEDRINI H, and AVILA S. Weakly supervised learning guided by activation mapping applied to a novel citrus pest benchmark[C]. The IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Seattle, USA, 2020: 310–319. doi: 10.1109/CVPRW50498.2020.00043. [7] FANG Mingwei, TAN Zhiping, TANG Yu, et al. Pest-ConFormer: A hybrid CNN-Transformer architecture for large-scale multi-class crop pest recognition[J]. Expert Systems with Applications, 2024, 255: 124833. doi: 10.1016/j.eswa.2024.124833. [8] CHENG Zekai and XIA Wan. Fine-grained image classification on agricultural pest larvae[J]. IOP Conference Series: Earth and Environmental Science, 2021, 792: 012037. doi: 10.1088/1755-1315/792/1/012037. [9] AMARATHUNGA D C, RATNAYAKE M N, GRUNDY J, et al. Fine-grained image classification of microscopic insect pest species: Western Flower thrips and Plague thrips[J]. Computers and Electronics in Agriculture, 2022, 203: 107462. doi: 10.1016/j.compag.2022.107462. [10] WANG Linfeng, LIU Yong, LI Jiayao, et al. Based on the multi-scale information sharing network of fine-grained attention for agricultural pest detection[J]. PLoS One, 2023, 18(10): e0286732. doi: 10.1371/journal.pone.0286732. [11] 赵凤, 耿苗苗, 刘汉强, 等. 卷积神经网络与视觉Transformer联合驱动的跨层多尺度融合网络高光谱图像分类方法[J]. 电子与信息学报, 2024, 46(5): 2237–2248. doi: 10.11999/JEIT231209.ZHAO Feng, GENG Miaomiao, LIU Hanqiang, et al. Convolutional neural network and vision Transformer-driven cross-layer multi-scale fusion network for hyperspectral image classification[J]. Journal of Electronics & Information Technology, 2024, 46(5): 2237–2248. doi: 10.11999/JEIT231209. [12] 文泓力, 胡庆浩, 黄立威, 等. 基于参数高效ViT与多模态导引的遥感图像小样本分类方法[J]. 电子与信息学报, 2025, 47(12): 4689–4703. doi: 10.11999/JEIT250996.WEN Hongli, HU Qinghao, HUANG Liwei, et al. Few-shot remote sensing image classification based on parameter-efficient vision transformer and multimodal guidance[J]. Journal of Electronics & Information Technology, 2025, 47(12): 4689–4703. doi: 10.11999/JEIT250996. [13] 宋婉莹, 刘毓琛, 王杰, 等. 面向高分辨遥感图像的熵驱动自适应融合网络构建与场景分类研究[J]. 电子与信息学报, 2025. doi: 10.11999/JEIT251147.SONG Wanying, LIU Yuchen, WANG Jie, et al. Entropy-driven adaptive fusion network for scene classification of high-resolution remote sensing images[J]. Journal of Electronics & Information Technology, 2025. doi: 10.11999/JEIT251147. [14] HAN Yuantao, ZHANG Cong, ZHAN Xiaoyun, et al. Crossing multiple life stages: Fine-grained classification of agricultural pests[J]. Plant Methods, 2024, 20(1): 191. doi: 10.1186/s13007-024-01317-w. [15] WANG Jiahui, XU Qin, JIANG Bo, et al. Multi-granularity part sampling attention for fine-grained visual classification[J]. IEEE Transactions on Image Processing, 2024, 33: 4529–4542. doi: 10.1109/TIP.2024.3441813. [16] DAI Jifeng, QI Haozhi, XIONG Yuwen, et al. Deformable convolutional networks[C]. The 2017 IEEE International Conference on Computer Vision, Venice, Italy, 2017: 764–773. doi: 10.1109/ICCV.2017.89. [17] KIRKLAND E J. Advanced Computing in Electron Microscopy[M]. 2nd ed. New York: Springer, 2010: 261–263. doi: 10.1007/978-1-4419-6533-2. [18] NANDHINI C and BRINDHA M. Visual regenerative fusion network for pest recognition[J]. Neural Computing and Applications, 2024, 36(6): 2867–2882. doi: 10.1007/s00521-023-09173-w. [19] MALIK P and PARIDA M K. Classification of insect pest using transfer learning mechanism[C]. The 8th International Conference on Computer Vision and Image Processing, Jammu, India, 2023: 78–89. doi: 10.1007/978-3-031-58535-7_7. [20] DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16x16 words: Transformers for image recognition at scale[C]. The 9th International Conference on Learning Representations, 2021. [21] XU Qin, WANG Jiahui, JIANG Bo, et al. Fine-grained visual classification via internal ensemble learning transformer[J]. IEEE Transactions on Multimedia, 2023, 25: 9015–9028. doi: 10.1109/TMM.2023.3244340. [22] 张文丽, 宋威. 基于特征融合与集成学习的细粒度图像分类[J]. 激光与光电子学进展, 2024, 61(22): 2237010. doi: 10.3788/LOP240759.ZHANG Wenli and SONG Wei. Fine-grained image classification based on feature fusion and ensemble learning[J]. Laser & Optoelectronics Progress, 2024, 61(22): 2237010. doi: 10.3788/LOP240759. [23] LIU Honglin, ZHAN Yongzhao, XIA Huifen, et al. Self-supervised transformer-based pre-training method using latent semantic masking auto-encoder for pest and disease classification[J]. Computers and Electronics in Agriculture, 2022, 203: 107448. doi: 10.1016/j.compag.2022.107448. [24] LU Xiaowei, WANG Kanqi, WANG Peiyu, et al. Gate-ViT: Gated vision transformer for fine-grained visual classification[C]. The 29th Pacific-Asia Conference on Knowledge Discovery and Data Mining, Sydney, Australia, 2025: 468–479. doi: 10.1007/978-981-96-8180-8_37. [25] LIU Wei and ZHANG Ao. Plant disease detection algorithm based on efficient Swin transformer[J]. Computers, Materials & Continua, 2025, 82(2): 3045–3068. doi: 10.32604/cmc.2024.058640. [26] RIOS E A, YUANDA J C, GHANZ V L, et al. Cross-layer cache aggregation for token reduction in ultra-fine-grained image recognition[C]. 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Hyderabad, India, 2025: 1–5. doi: 10.1109/icassp49660.2025.10890489. [27] RIOS E A, HU Minchun, and LAI Bocheng. Global-local similarity for efficient fine-grained image recognition with vision transformers[C]. 2025 IEEE International Symposium on Circuits and Systems (ISCAS), London, UK, 2025: 1–5. doi: 10.1109/ISCAS56072.2025.11043866. -
下载:
下载: