动态视觉中针对运动微小目标检测的长短时融合脉冲神经网络

李淼; 张恒; 陈诺; 石杨思; 何诗曼; 安玮

doi:10.11999/JEIT250785

动态视觉中针对运动微小目标检测的长短时融合脉冲神经网络

doi: 10.11999/JEIT250785 cstr: 32379.14.JEIT250785

国防科技大学电子科学学院长沙 410073

详细信息

作者简介:
李淼：男，副研究员，研究方向为智能光电感知

张恒：男，硕士生，研究方向为脉冲神经网络

陈诺：男，博士生，研究方向为仿生光电探测

石杨思：男，博士生，研究方向为多模态光电融合处理

何诗曼：女，硕士生，研究方向为事件相机数据处理

安玮：女，教授，研究方向为空间信息获取与处理

通讯作者:
陈诺　chennuo97@nudt.edu.cn

中图分类号: TN911.73; TP391.41
计量
- 文章访问数: 329
- HTML全文浏览量: 203
- PDF下载量: 29
- 被引次数: 0
出版历程
- 收稿日期: 2025-08-22
- 修回日期: 2026-01-16
- 录用日期: 2026-02-09
- 网络出版日期: 2026-03-01
- 刊出日期: 2026-04-10

A Long-Short Term Fusion Spiking Neural Network for Detecting Tiny Moving Targets in Dynamic Vision

College of Electronic Science and Technology, National University of Defense Technology, Changsha 410073, China

摘要

摘要: 动态视觉机制具有数据冗余低、事件采样频率高等优点，是远距离光电监视系统的理想探测方式，但其中的目标表现为稀疏事件流中的运动微小目标，针对常规有形态目标的方法难以适用。针对此问题，该文受类脑处理中的第3代神经网络启发，结合动态视觉机制的异步感知和脉冲表征特点，设计针对运动微小目标的长短时融合脉冲神经网络。针对目标形态扩散性，设计脉冲Swin Transformer模块，以脉冲自注意力机制自适应学习微小目标与相邻时空像素的关联性；针对目标运动连续性，对ConvLSTM神经元进行脉冲化建模，形成适应事件数据的脉冲ConvLSTM模块，自动学习长时域中的运动信息；并结合脉冲金字塔模块等结构，融合双链路多尺度特征，实现了从极其有限表层特征中挖掘高维度深度特征。基于实测数据测试表明，该文设计方法针对运动微小目标的召回率可达95%以上，消融实验验证了增加长时域特征学习模块并利用更长时间的事件数据，可有效提升性能。
- 微小目标 /
- 目标检测 /
- 脉冲神经网络 /
- 动态视觉
Abstract: Objective Long-distance electro-optical surveillance systems are widely used for applications such as space debris monitoring and unauthorized drone flight warning. In such systems, targets appear randomly and move rapidly. Because of the long detection distance, targets appear extremely small in the optical sensor and lack obvious morphological or texture features; therefore, they are classified as tiny moving targets. Conventional tiny-target perception mechanisms adopt the “image frame imaging + artificial neural network processing” paradigm. This approach generates large data volumes and requires high computational power and energy consumption, which restricts system lightweight deployment. In recent years, inspired by bionic perception and brain-like processing, the paradigm of “dynamic vision detection + brain-like processing” has emerged as a new direction. Dynamic vision provides low redundancy and high temporal resolution. However, its output is not regular image frames but sparse event streams, which require new processing methods. The Spiking Neural Network (SNN) is regarded as the third-generation neural network. It uses sparse connections and spike-based representations and naturally matches the asynchronous event triggering and bright-dark pulse output of dynamic vision sensors. Existing SNN-based methods mainly focus on targets with clear shapes in scenarios such as autonomous driving and are not well suited for tiny moving targets in long-distance electro-optical surveillance systems. To address this problem, a Long-Short Term Fusion SNN is proposed to support the application of dynamic vision in tiny moving target detection. Methods The proposed network architecture contains four main components. First, a short-term feature extraction module, the Spiking Swin Transformer (SST), is designed to capture the morphological expansion characteristics of tiny targets. This module focuses on spatiotemporal correlations across adjacent time steps and spatial regions. It integrates a spiking self-attention mechanism to enhance the learning of irregular pixel correlations and temporal dependencies. Second, a long-term feature extraction module, the spiking ConvLSTM (SCL), is proposed to learn motion continuity embedded in long temporal sequences. A longer temporal range provides richer learnable motion features. The SCL is designed based on the ANN-style ConvLSTM architecture and takes advantage of the inherent temporal processing capability of spiking recurrent neural networks to strengthen long-term temporal memory. Third, features from the SST and SCL branches are aligned and integrated through tensor alignment and additive fusion, forming the Spiking Feature Pyramid Network (SFPN). This module performs spiking pyramid operations to fuse cross-scale spatiotemporal features across different network depths. Finally, a detection head is used to extract and identify tiny targets. Results and Discussions The proposed algorithm is validated using real dynamic vision data for drone detection. Experimental results show clear performance improvements across several evaluation metrics. Compared with methods that rely only on short-term temporal features, the proposed method increases recall by about 1.3% and improves accuracy by precision 0.9%, which allows more reliable detection of tiny moving targets. Analysis of the F1-score further indicates that recall improves by 1.3% while false alarms are reduced. These results confirm that the dual-path spiking memory network for long-term feature extraction strengthens the ability of the model to identify subtle target characteristics. In particular, the integration of long-term temporal features improves discrimination between noise events and genuine tiny targets. Conclusions This study addresses tiny moving target detection under dynamic vision and proposes a method based on Long-Short Term Fusion SNN. Considering the morphological expansion characteristics and motion continuity of tiny targets, the SST module and the SCL module are designed to extract short-term and long-term temporal features. Multi-scale dual-path features are fused through a spiking pyramid module. By learning high-dimensional features across different temporal windows, the method enables deeper mining and automatic learning of limited surface features of tiny targets. Experiments on real dynamic vision data verify the performance advantage of the proposed method, achieving a recall rate above 95% and outperforming comparison algorithms. Ablation experiments further demonstrate that long-term temporal feature learning and larger temporal data ranges improve tiny target detection performance. The proposed method enables natural integration between sparse event streams from dynamic vision sensors and spiking neural mechanisms. It provides algorithmic support for applying the “bionic detection + brain-like processing” perception paradigm in long-distance electro-optical surveillance systems.
- Tiny target /
- Target detection /
- Spiking neural network /
- Dynamic vision

HTML全文

图 1 动态视觉探测的目标形态

下载: 全尺寸图片幻灯片

图 2 本文网络总体架构

下载: 全尺寸图片幻灯片

图 3 脉冲Swin Transformer模块结构图

下载: 全尺寸图片幻灯片

图 4 局部滑窗脉冲自注意力机制(WSSA)结构图

下载: 全尺寸图片幻灯片

图 5 脉冲ConvLSTM神经元结构图

下载: 全尺寸图片幻灯片

图 6 脉冲金字塔模块结构图

下载: 全尺寸图片幻灯片

图 7 微小运动目标数据集

下载: 全尺寸图片幻灯片

图 8 随机噪声点影响下不同方法的检测结果(示例1)

下载: 全尺寸图片幻灯片

图 9 随机噪声点影响下不同方法检测结果(示例2)

下载: 全尺寸图片幻灯片

表 1 不同目标检测算法性能对比

方法	Re(%)	Pr(%)	F1(%)	Fa($ \times 1{0}^{-6} $)
基于VGG的短时域脉冲神经网络	89.6	82.1	85.7	2.17
基于SST的短时域脉冲神经网络	94.4	84.3	89.1	1.95
本文方法	95.7	85.2	90.1	1.85

下载: 导出CSV

表 2 不同网络深度及不同事件时长的算法性能对比

方法($ n $,$ N $)	Re(%)	Pr(%)	F1(%)	Fa($ \times 1{0}^{-6} $)	参数量(M)
本文方法 (1, 5)	90.4	82.0	86.0	2.21	5.9
本文方法 (2, 5)	93.8	81.3	87.1	2.41	6.0
本文方法 (1, 8)	94.2	82.8	88.1	2.18	5.9
本文方法 (2, 8)	95.2	86.1	90.4	1.70	6.0
本文方法 (1,10)	95.7	85.2	90.1	1.85	5.9
本文方法 (2,10)	95.8	85.4	90.3	1.82	6.0
基于SST的脉冲神经网络	94.4	84.3	89.1	1.95	4.1

下载: 导出CSV

参考文献(25)

[1]	LI Ruojing, AN Wei, XIAO Chao, et al. Direction-coded temporal U-shape module for multiframe infrared small target detection[J]. IEEE Transactions on Neural Networks and Learning Systems, 2025, 36(1): 555–568. doi: 10.1109/TNNLS.2023.3331004.
[2]	FILHO W L, ABUBAKAR I R, HUNT J D, et al. Managing space debris: Risks, mitigation measures, and sustainability challenges[J]. Sustainable Futures, 2025, 10: 100849. doi: 10.1016/j.sftr.2025.100849.
[3]	LI Boyang, XIAO Chao, WANG Longguang, et al. Dense nested attention network for infrared small target detection[J]. IEEE Transactions on Image Processing, 2023, 32: 1745–1758. doi: 10.1109/TIP.2022.3199107.
[4]	李朝旭, 徐清宇, 安玮, 等. 红外图像暗弱目标轻量级检测网络[J]. 红外与毫米波学报, 2025, 44(2): 299–310. doi: 10.11972/j.issn.1001-9014.2025.02.017. LI Zhaoxu, XU Qingyu, AN Wei, et al. A lightweight dark object detection network for infrared images[J]. Journal of Infrared and Millimeter Waves, 2025, 44(2): 299–310. doi: 10.11972/j.issn.1001-9014.2025.02.017.
[5]	WANG Hongxin, WANG Huatian, ZHAO Jiannan, et al. A time-delay feedback neural network for discriminating small, fast-moving targets in complex dynamic environments[J]. IEEE Transactions on Neural Networks and Learning Systems, 2023, 34(1): 316–330. doi: 10.1109/TNNLS.2021.3094205.
[6]	ZHU Yabin, LI Chenglong, LIU Yao, et al. Tiny object tracking: A large-scale dataset and a baseline[J]. IEEE Transactions on Neural Networks and Learning Systems, 2024, 35(8): 10273–10287. doi: 10.1109/TNNLS.2023.3239529.
[7]	丁翔, 乔凯. 基于多物理场耦合的空中目标红外探测多参数联合寻优方法[J]. 红外与毫米波学报, 2025, 44(3): 444–445. doi: 10.11972/j.issn.1001-9014.2025.03.014. DING Xiang and QIAO Kai. Multi-physics coupling-based multi-parameter joint optimization technique for aerial target infrared detection[J]. Journal of Infrared and Millimeter Waves, 2025, 44(3): 444–445. doi: 10.11972/j.issn.1001-9014.2025.03.014.
[8]	谷雨, 张宏宇, 孙仕成. 融合多尺度分形注意力的红外小目标检测模型[J]. 电子与信息学报, 2023, 45(8): 3002–3011. doi: 10.11999/JEIT220919. GU Yu, ZHANG Hongyu, and SUN Shicheng. Infrared small target detection model with multi-scale fractal attention[J]. Journal of Electronics & Information Technology, 2023, 45(8): 3002–3011. doi: 10.11999/JEIT220919.
[9]	李淼, 陈诺, 安玮, 等. 面向事件相机探测无人机的双视图融合检测方法[J]. 光电工程, 2024, 51(11): 240208. doi: 10.12086/oee.2024.240208. LI Miao, CHEN Nuo, AN Wei, et al. Dual view fusion detection method for event camera detection of unmanned aerial vehicles[J]. Opto-Electronic Engineering, 2024, 51(11): 240208. doi: 10.12086/oee.2024.240208.
[10]	CHEN Nuo, ZHANG Chushu, AN Wei, et al. Event-based motion deblurring with blur-aware reconstruction filter[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2025, 35(9): 8508–8519. doi: 10.1109/TCSVT.2025.3551516.
[11]	GEHRIG D and SCARAMUZZA D. Low-latency automotive vision with event cameras[J]. Nature, 2024, 629(8014): 1034–1040. doi: 10.1038/s41586-024-07409-w.
[12]	LI Zhengqi, NIKLAUS S, SNAVELY N, et al. Neural scene flow fields for space-time view synthesis of dynamic scenes[C]. The 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, USA, 2021: 6498–6508. doi: 10.1109/CVPR46437.2021.00643.
[13]	MITROKHIN A, HUA Zhiyuan, FERMÜLLER C, et al. Learning visual motion segmentation using event surfaces[C]. The 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, USA, 2020: 14414–14423. doi: 10.1109/CVPR42600.2020.01442.
[14]	SCHAEFER S, GEHRIG D, and SCARAMUZZA D. AEGNN: Asynchronous event-based graph neural networks[C]. The 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, USA, 2022: 12371–12381. doi: 10.1109/CVPR52688.2022.01205.
[15]	MAQUEDA A I, LOQUERCIO A, GALLEGO G, et al. Event-based vision meets deep learning on steering prediction for self-driving cars[C]. The 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, USA, 2018: 5419–5427. doi: 10.1109/CVPR.2018.00568.
[16]	ZHU A Z, YUAN Liangzhe, CHANEY K, et al. Unsupervised event-based learning of optical flow, depth, and egomotion[C]. The 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, USA, 2019: 989–997. doi: 10.1109/CVPR.2019.00108.
[17]	CAI Zongyuan and LI Xinze. Neuromorphic brain-inspired computing with hybrid neural networks[C]. 2021 IEEE International Conference on Artificial Intelligence and Industrial Design, Guangzhou, China, 2021: 343–347. doi: 10.1109/AIID51893.2021.9456483.
[18]	刘浩, 柴洪峰, 孙权, 等. 脉冲神经网络研究现状与应用进展[J]. 中国工程科学, 2023, 25(6): 61–79. doi: 10.15302/J-SSCAE-2023.06.011. LIU Hao, CHAI Hongfeng, SUN Quan, et al. A review of recent advances and application for spiking neural networks[J]. Strategic Study of CAE, 2023, 25(6): 61–79. doi: 10.15302/J-SSCAE-2023.06.011.
[19]	EL MAACHI S, CHEHRI A, and SAADANE R. Efficient hardware acceleration of spiking neural networks using FPGA: Towards real-time edge neuromorphic computing[C]. IEEE 99th Vehicular Technology Conference, Singapore, 2024: 1–5. doi: 10.1109/VTC2024-Spring62846.2024.10683049.
[20]	BODDEN L, HA D B, SCHWAIGER F, et al. Spiking CenterNet: A distillation-boosted spiking neural network for object detection[C]. 2024 International Joint Conference on Neural Networks (IJCNN), Yokohama, Japan, 2024: 1–9. doi: 10.1109/IJCNN60899.2024.10650418.
[21]	CHEN Nuo, LI Boyang, WANG Yingqian, et al. Motion and appearance decoupling representation for event cameras[J]. IEEE Transactions on Image Processing, 2025, 34: 5964–5977. doi: 10.1109/TIP.2025.3607632.
[22]	CHEN Nuo, XIAO Chao, DAI Yimian, et al. Event-based tiny object detection: A benchmark dataset and baseline[EB/OL]. https://arxiv.org/abs/2506.23575, 2025.
[23]	LI Ruojing, AN Wei, WANG Yingqian, et al. Probing deep into temporal profile makes the infrared small target detector much better[EB/OL]. https://arxiv.org/abs/2506.12766, 2025.
[24]	SHI Yangsi, LI Miao, CHEN Nuo, et al. Sparse-gated RGB-event fusion for small object detection in the wild[J]. Remote Sensing, 2025, 17(17): 3112. doi: 10.3390/rs17173112.
[25]	ZHANG Heng, CHEN Nuo, LI Miao, et al. Spiking swin transformer for UAV object detection based on event cameras[C]. The 12th International Conference on Information Systems and Computing Technology (ISCTech), Xi’an, China, 2024: 1–6. doi: 10.1109/ISCTech63666.2024.10845340.