Differentiable Sparse Mask Guided Infrared Small Target Fast Detection Network
-
摘要: 红外小目标检测在遥感探测、红外制导和环境监测等领域具有不可替代的应用价值,其核心挑战在于目标像素占比极小(目标尺寸通常小于9×9)、空间特征稀疏且易被复杂背景杂波淹没。现有红外小目标方法或依赖手工设计的背景抑制算子,难以适应复杂场景;或采用密集卷积神经网络,未充分考虑目标背景占比极不均衡导致的计算冗余。基于目标稀疏先验,该文提出一种可微稀疏掩模引导的红外小目标快速检测网络。首先,设计可微稀疏掩模生成模块作为预处理,输出目标候选区域的二值掩码,实现对目标的粗检测,并过滤大量背景冗余信息;其次,基于Minkowski Engine稀疏卷积构建稀疏特征提取模块,仅对二值掩码中的非零目标区域进行稀疏卷积运算,实现对目标候选区域的精细化处理;最后,通过金字塔池化模块进行多尺度特征融合,并将融合后的特征送入目标-背景二分类器输出最终检测结果。为验证方法有效性,在NUDT-SIRST与NUAA-SIRST两大主流红外小目标数据集上进行实验,实验结果表明,所提方法实现了在检测性能相当的情况下,实现了检测效率的极大改善,验证了所提方法的有效性。Abstract:
Objective Infrared small target detection has significant and irreplaceable application value in infrared guidance, environmental monitoring, and security surveillance. Its relevance is reflected in early warning, precision targeting, and pollution tracking, where timely and accurate detection is required. Core challenges arise from the inherent properties of infrared small targets: extremely small size (typically less than 9 × 9 pixels), limited spatial features due to long imaging distance, and a high likelihood of being submerged in complex and cluttered backgrounds such as clouds, sea glint, or urban thermal noise. These factors hinder reliable separation of true targets from background clutter using conventional approaches. Existing methods are generally divided into traditional model-based techniques and modern deep learning techniques. Traditional methods rely on manually designed background suppression operators, such as morphological filters (e.g., Top-Hat) or low-rank matrix recovery (e.g., IPI). Although interpretable in simple scenes, they adapt poorly to dynamic and complex environments, leading to high false alarm rates and limited robustness. Deep learning methods, particularly dense Convolutional Neural Networks (CNNs), achieve improved performance through data-driven feature learning. However, they do not sufficiently address the extreme imbalance between target and background pixels, with targets usually accounting for less than 1% of an image. Therefore, substantial computational redundancy occurs because large background regions contribute little to detection, which limits efficiency and real-time capability. Exploiting the sparsity of infrared small targets therefore provides a practical direction. By introducing a sparse mask generation module that uses target sparsity, potential target regions can be coarsely extracted while most redundant background areas are suppressed, followed by refinement in later stages. This study presents a solution that balances detection accuracy and computational efficiency for real-time applications. Methods An end-to-end infrared small target detection network guided by a differentiable sparse mask is proposed. First, an input infrared image is first processed by convolution to obtain raw features. A differentiable sparse mask generation module then adopts two convolution branches to generate a probability map and a threshold map, and produces a binary mask through a differentiable binarization function to extract candidate target regions and suppress background redundancy. Next, a target region sampling module converts dense raw features into sparse features according to the binary mask. A sparse feature extraction module with a U-shaped structure, composed of encoders, decoders, and skip connections, applies Minkowski Engine sparse convolution to perform refined processing only on non-zero target regions, thereby reducing computation. Finally, a pyramid pooling module fuses multi-scale sparse features, which are fed into a target-background binary classifier to generate detection results. Results and Discussions Comprehensive experiments are conducted on two mainstream infrared small target datasets: NUAA-SIRST, which includes 427 real infrared images extracted from real videos, and NUDT-SIRST, a large-scale synthetic dataset containing 1 327 diverse images. Comparisons are made with three representative traditional algorithms (e.g., Top-Hat, IPI) and six state-of-the-art deep learning methods (e.g., DNA-Net, ACM). The proposed method achieves competitive detection performance. On NUAA-SIRST, it attains 74.38% IoU, 100% Pd, and 7.98 × 10–6 Fa. On NUDT-SIRST, it reaches 83.03% IoU, 97.67% Pd, and 9.81 × 10–6 Fa, which is comparable to leading deep learning approaches. High efficiency is observed, with only 0.35 M parameters, 11.10 GFLOPs, and 215.06 fps. The frame rate is 4.8 times that of DNA-Net, indicating a substantial reduction in computational redundancy. Ablation experiments ( Fig. 6 ) confirm that the differentiable sparse mask module effectively suppresses most background regions while retaining target areas. Visual results (Fig. 5 ) show fewer false alarms than traditional methods such as PSTNN, as the coarse-to-fine strategy reduces background interference and supports a balance between accuracy and efficiency.Conclusions A fast infrared small target detection network guided by a differentiable sparse mask is proposed to address the severe computational redundancy of dense computation methods, which originates from the extreme imbalance between target and background pixels (target proportion is usually smaller than 1% of the whole image). Candidate target regions are adaptively extracted and background redundancy is filtered through a differentiable sparse mask generation module. A sparse feature extraction module based on Minkowski Engine sparse convolution further reduces computation, forming an end-to-end coarse-to-fine detection framework. Experiments on the NUAA-SIRST and NUDT-SIRST datasets show that the proposed method achieves detection performance comparable to existing deep learning methods while significantly optimizing computational efficiency. The method supports real-time requirements in scenarios such as remote sensing detection, infrared guidance, and environmental monitoring, and provides a practical reference for lightweight development in infrared small target detection. -
表 1 在NUAA-SIRST和NUDT-SIRST数据下不同方法性能对比
方法 NUAA-SIRST NUDT-SIRST #Params
(M)FLOPs
(G)fps IoU (%) Pd (%) Fa (×10–6) IoU (%) Pd (%) Fa (×10–6) Top-Hat[31] 7.14 79.84 1012 20.72 78.41 166.70 - - 336.36 IPI[6] 25.67 85.55 11.47 17.76 74.49 41.23 - - 0.12 PSTNN[8] 22.40 77.95 29.11 14.85 66.13 44.17 - - 5.40 MDvsFA[11] 60.30 89.35 56.35 74.14 90.47 25.34 3.92 264.96 4.72 ACM[12] 70.33 93.91 3.73 67.08 95.97 10.18 0.52 0.43 180.32 ISTDU[16] 58.83 89.91 40.63 78.80 97.04 21.51 2.76 7.44 134.28 DNA-Net[9] 76.24 97.71 12.80 87.09 98.73 4.22 4.70 14.02 45.20 RDIAN[17] 68.98 96.33 29.63 73.36 94.82 47.94 0.22 3.69 278.80 HoLoCoNet[18] 73.89 100.00 19.87 80.90 97.67 13.54 0.70 6.60 125.49 本文方法 74.38 100.00 7.98 83.03 97.67 9.81 0.35 11.10 215.06 表 2 在NUAA-SIRST数据集上对金字塔池化模块有效性验证的结果
方法 IoU (%) Pd (%) Fa (×10–6) #Params
(M)FLOPs
(G)fps 消融模型 67.7 98.17 30.10 0.34 8.68 252.05 本文方法 74.38 100.00 7.98 0.35 11.10 215.06 -
[1] HAN Zonghao, ZHANG Ziye, ZHANG Shun, et al. Aerial visible-to-infrared image translation: Dataset, evaluation, and baseline[J]. Journal of Remote Sensing, 2023, 3: 0096. doi: 10.34133/remotesensing.0096. [2] WANG Qunming and HUANG Ruijie. RES-STF: Spatio temporal fusion of visible infrared imaging radiometer suite and landsat land surface temperature based on restormer[J]. Journal of Remote Sensing, 2024, 4: 0208. doi: 10.34133/remotesensing.0208. [3] 张晶晶, 曹思华, 崔文楠, 等. 基于改进顶帽变换的红外弱小目标检测[J]. 电子与信息学报, 2024, 46(1): 267–276. doi: 10.11999/JEIT221562.ZHANG Jingjing, CAO Sihua, CUI Wennan, et al. Improved top-hat transform-based algorithm for infrared dim and small target detection[J]. Journal of Electronics & Information Technology, 2024, 46(1): 267–276. doi: 10.11999/JEIT221562. [4] HAN Jinhui, MORADI S, FARAMARZI I, et al. A local contrast method for infrared small-target detection utilizing a tri-layer window[J]. IEEE Geoscience and Remote Sensing Letters, 2020, 17(10): 1822–1826. doi: 10.1109/LGRS.2019.2954578. [5] CHEN C L P, LI Hong, WEI Yantao, et al. A local contrast method for small infrared target detection[J]. IEEE Transactions on Geoscience and Remote Sensing, 2014, 52(1): 574–581. doi: 10.1109/TGRS.2013.2242477. [6] GAO Chenqiang, MENG Deyu, YANG Yi, et al. Infrared patch-image model for small target detection in a single image[J]. IEEE Transactions on Image Processing, 2013, 22(12): 4996–5009. doi: 10.1109/TIP.2013.2281420. [7] LIU Ting, YANG Jungang, LI Boyang, et al. Nonconvex tensor low-rank approximation for infrared small target detection[J]. IEEE Transactions on Geoscience and Remote Sensing, 2022, 60: 5614718. doi: 10.1109/TGRS.2021.3130310. [8] ZHANG Landan and PENG Zhenming. Infrared small target detection based on partial sum of the tensor nuclear norm[J]. Remote Sensing, 2019, 11(4): 382. doi: 10.3390/rs11040382. [9] LI Boyang, XIAO Chao, WANG Longguang, et al. Dense nested attention network for infrared small target detection[J]. IEEE Transactions on Image Processing, 2023, 32: 1745–1758. doi: 10.1109/TIP.2022.3199107. [10] HU Chen, HUANG Yian, LI Kexuan, et al. DATransNet: Dynamic attention transformer network for infrared small target detection[J]. IEEE Geoscience and Remote Sensing Letters, 2025, 22: 7001005. doi: 10.1109/LGRS.2025.3557021. [11] WANG Huan, ZHOU Luping, and WANG Lei. Miss detection vs. false alarm: Adversarial learning for small object segmentation in infrared images[C]. 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, South Korea, 2019: 8508–8517. doi: 10.1109/ICCV.2019.00860. [12] DAI Yimian, WU Yiquan, ZHOU Fei, et al. Asymmetric contextual modulation for infrared small target detection[C]. 2021 IEEE Winter Conference on Applications of Computer Vision (WACV), Waikoloa, USA, 2021: 949–958. doi: 10.1109/WACV48630.2021.00099. [13] DAI Yimian, WU Yiquan, ZHOU Fei, et al. Attentional local contrast networks for infrared small target detection[J]. IEEE Transactions on Geoscience and Remote Sensing, 2021, 59(11): 9813–9824. doi: 10.1109/TGRS.2020.3044958. [14] ZHANG Mingjin, ZHANG Rui, YANG Yuxiang, et al. ISNet: Shape matters for infrared small target detection[C]. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, USA, 2022: 867–876. doi: 10.1109/CVPR52688.2022.00095. [15] WU Xin, HONG Danfeng, and CHANUSSOT J. UIU-net: U-net in U-net for infrared small object detection[J]. IEEE Transactions on Image Processing, 2023, 32: 364–376. doi: 10.1109/TIP.2022.3228497. [16] HOU Qingyu, ZHANG Liuwei, TAN Fanjiao, et al. ISTDU-Net: Infrared small-target detection U-Net[J]. IEEE Geoscience and Remote Sensing Letters, 2022, 19: 7506205. doi: 10.1109/LGRS.2022.3141584. [17] SUN Heng, BAI Junxiang, YANG Fan, et al. Receptive-field and direction induced attention network for infrared dim small target detection with a large-scale dataset IRDST[J]. IEEE Transactions on Geoscience and Remote Sensing, 2023, 61: 5000513. doi: 10.1109/TGRS.2023.3235150. [18] CHEN Gao, WANG Zhuang, WANG Weihua, et al. Holistic modularization of local contrast in the end-to-end network for infrared small target detection[J]. IEEE Geoscience and Remote Sensing Letters, 2023, 20: 7001305. doi: 10.1109/LGRS.2023.3320191. [19] ZHANG Mingjin, YUE Ke, LI Boyang, et al. Single-frame infrared small target detection via Gaussian curvature inspired network[J]. IEEE Transactions on Geoscience and Remote Sensing, 2024, 62: 5005013. doi: 10.1109/TGRS.2024.3423492. [20] REN Xiangyang, JIAO Boyang, PENG Zhenming, et al. MSFFNet: A multilevel sparse feature fusion network for infrared dim small target detection[J]. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2025, 18: 147–159. doi: 10.1109/JSTARS.2024.3488698. [21] ZHANG Luping, LUO Junhai, HUANG Yian, et al. MDIGCNet: Multidirectional information-guided contextual network for infrared small target detection[J]. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2025, 18: 2063–2076. doi: 10.1109/JSTARS.2024.3508255. [22] WU Shuanglin, XIAO Chao, WANG Yingqian, et al. Sparsity-aware global channel pruning for infrared small-target detection networks[J]. IEEE Transactions on Geoscience and Remote Sensing, 2025, 63: 5615011. doi: 10.1109/TGRS.2025.3544645. [23] CHUNG W Y, LEE I H, and PARK C G. Lightweight infrared small target detection network using full-scale skip connection U-Net[J]. IEEE Geoscience and Remote Sensing Letters, 2023, 20: 7000705. doi: 10.1109/LGRS.2023.3276326. [24] KOU Renke, WANG Chunping, YU Ying, et al. LW-IRSTNet: Lightweight infrared small target segmentation network and application deployment[J]. IEEE Transactions on Geoscience and Remote Sensing, 2023, 61: 5621313. doi: 10.1109/TGRS.2023.3314586. [25] MA Tianlei, YANG Zhen, SONG Yifan, et al. DMEF-Net: Lightweight infrared dim small target detection network for limited samples[J]. IEEE Transactions on Geoscience and Remote Sensing, 2023, 61: 5626015. doi: 10.1109/TGRS.2023.3333378. [26] ZHANG Mingjin, YANG Handi, GUO Jie, et al. IRPruneDet: Efficient infrared small target detection via wavelet structure-regularized soft channel pruning[C]. The Thirty-Eighth AAAI Conference on Artificial Intelligence, Vancouver, Canada, 2024: 7224–7232. doi: 10.1609/aaai.v38i7.28551. [27] LI Boyang, WANG Longguang, WANG Yingqian, et al. Mixed-precision network quantization for infrared small target segmentation[J]. IEEE Transactions on Geoscience and Remote Sensing, 2024, 62: 5000812. doi: 10.1109/TGRS.2023.3346904. [28] XIAO Chao, AN Wei, ZHANG Yifan, et al. Highly efficient and unsupervised framework for moving object detection in satellite videos[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024, 46(12): 11532–11539. doi: 10.1109/TPAMI.2024.3409824. [29] LIAO Minghui, ZOU Zhisheng, WAN Zhaoyi, et al. Real-time scene text detection with differentiable binarization and adaptive scale fusion[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45(1): 919–931. doi: 10.1109/TPAMI.2022.3155612. [30] SMITH L N and TOPIN N. Super-convergence: Very fast training of neural networks using large learning rates[C]. Proceedings Volume 11006, Artificial Intelligence and Machine Learning for Multi-domain Operations Applications, Baltimore, United States, 2019: 369–386. doi: 10.1117/12.2520589. [31] RIVEST J F and FORTIN R. Detection of dim targets in digital infrared imagery by morphological image processing[J]. Optical Engineering, 1996, 35(7): 1886–1893. doi: 10.1117/1.600620. -
下载:
下载: