动态视觉中针对运动微小目标检测的长短时融合脉冲神经网络

李淼; 张恒; 陈诺; 石杨思; 何诗曼; 安玮

doi:10.11999/JEIT250785

动态视觉中针对运动微小目标检测的长短时融合脉冲神经网络

doi: 10.11999/JEIT250785 cstr: 32379.14.JEIT250785

国防科技大学电子科学学院长沙 410073

详细信息

作者简介:
李淼：男，副研究员，研究方向为智能光电感知

张恒：男，硕士研究生，研究方向为脉冲神经网络

陈诺：男，博士研究生，研究方向为仿生光电探测

石杨思：男，博士研究生，研究方向为多模态光电融合处理

何诗曼：女，硕士研究生，研究方向为事件相机数据处理

安玮：女，教授，研究方向为空间信息获取与处理

通讯作者:
陈诺　chennuo97@nudt.edu.cn

中图分类号: TN911.73; TP391.41
计量
- 文章访问数: 15
- HTML全文浏览量: 9
- PDF下载量: 0
- 被引次数: 0
出版历程
- 收稿日期: 2025-08-22
- 修回日期: 2026-01-16
- 录用日期: 2026-02-09
- 网络出版日期: 2026-03-01

A Long-Short Term Fusion Spiking Neural Network for Detecting Tiny Moving Targets in Dynamic Vision

University of China, Tianjin 300300, China) (College of Electronic Science and Technology, National University of Defense Technology, Changsha 410073, China)

摘要

摘要: 动态视觉机制具有数据冗余低、事件采样频率高等优点，是远距离光电监视系统的理想探测方式，但其中的目标表现为稀疏事件流中的运动微小目标，针对常规有形态目标的方法难以适用。针对此问题，该文受类脑处理中的第三代神经网络启发，结合动态视觉机制的异步感知和脉冲表征特点，设计针对运动微小目标的长短时融合脉冲神经网络。针对目标形态扩散性，设计脉冲Swin Transformer模块，以脉冲自注意力机制自适应学习微小目标与相邻时空像素的关联性；针对目标运动连续性，对ConvLSTM神经元进行脉冲化建模，形成适应事件数据的脉冲ConvLSTM模块，自动学习长时域中的运动信息；并结合脉冲金字塔模块等结构，融合双链路多尺度特征，实现了从极其有限表层特征中挖掘高维度深度特征。基于实测数据测试表明，该文设计方法针对运动微小目标的召回率可达95%以上，消融实验验证了增加长时域特征学习模块并利用更长时间的事件数据，可有效提升性能。
- 微小目标 /
- 目标检测 /
- 脉冲神经网络 /
- 动态视觉
Abstract: Objective The long-distance electro-optical surveillance system is widely used in fields such as space debris monitoring and unauthorized drone flight warning. The targets in this system randomly appear, move rapidly, and due to the long detection distance, the form of the targets in the optical detector is very small, without obvious morphological texture features, belonging to tiny-motion targets. The traditional mechanism for sensing tiny-motion targets adopts the "image frame imaging + artificial neural network processing" approach, which is always accompanied by large amounts of data, high computing power, and high energy consumption, becoming a bottleneck restricting the lightweight of the system. In recent years, inspired by bionic perception and brain-like processing, "dynamic visual detection + brain-like processing" has become the frontier mechanism. The dynamic vision has the advantages of low redundancy and high temporal resolution, but the output data is no longer regular image frames, but sparse event streams. Therefore, new processing methods need to be studied. The spiking neural network is called the third-generation neural network, which has the characteristics of sparse connections and spiking representation, and has a natural compatibility with the asynchronous event triggering and bright-dark pulse output of the dynamic vision. However, the existing spiking neural network methods are mainly oriented towards targets with special shapes in fields such as autonomous driving, are difficult adapt to the tiny-motion targets in long-distance electro-optical surveillance system. To address the above problems, this paper designs a long-short-term fusion pulse neural network, providing dedicated algorithm support for the application of the dynamic vision in the detection of tiny-motion targets. Methods The proposed network architecture consists of four key components. Firstly, a short-term feature extraction module (SST, Spiking Swin Transformer) is designed to capture morphological the morphological expansion characteristic of tiny targets, focusing on spatiotemporal correlations between adjacent time steps and spatial domains. It integrates a spiking self-attention mechanism to adaptively enhance learning of irregular pixel correlations and temporal dependencies. Second, a long-term feature extraction module (SCL, spiking ConvLSTM) is designed to learn motion continuity, which is embedded in long-term temporal sequences. The longer the temporal domain, the richer the learnable features. The spiking ConvLSTM network is designed by mimicking the ANN-style ConvLSTM, capitalizing on the inherent advantages of spiking recurrent neural networks for temporal signal processing to emphasize autonomous long-term temporal information memorization capabilities. Thirdly, dual-path features from SST and SCL are combined via tensor alignment and additive integration, called as SFPN(Spiking Feature Pyramid Network). Adopting spiking pyramid operations to fuse cross-scale spatiotemporal features across network depths. Finally, tiny targets are extracted by detection head. Results and Discussions The proposed algorithm was validated using real dynamic vision data for drone detection. Test results demonstrate significant performance improvements based different metrics. Compared to methods based on short-term temporal features, the proposed method achieves about 1.3% increase in recall and about 0.9% boost in accuracy, enabling more precise detection of tiny moving targets. The F1-score analysis further reveals that the proposed approach improves recall rates by 1.3%, and it simultaneously reduces false alarms. This confirms that the dual-path spiking memory network for long-term feature extraction enhances the model's capability to discern subtle target characteristics. Specifically, the incorporation of long-term temporal features contributes to overall performance gains, allowing better discrimination between noise events and genuine tiny targets. Conclusions This paper addresses the problem of detecting tiny moving targets under dynamic vision and proposes a method based on long-short term fusion of spiking neural networks. Considering the morphological expansion characteristics and motion continuity of tiny targets, the paper designs the spiking Swin Transformer module and the spiking ConvLSTM module respectively, and fuses multi-scale dual-path features through the spiking pyramid module. By learning high-dimensional features within different time windows, it achieves in-depth mining and automatic learning of limited surface features. The performance advantages of the proposed method are verified in real d datas, with a recall rate of over 95%, outperforming comparison algorithms. Ablation experiments demonstrate the importance of using long-term domain feature neural networks and more time-domain data to improve the performance of tiny target detection. This method realizes the natural combination of sparse event streams from dynamic vision and spiking neural mechanisms, providing algorithmic support for the application of the "bionic detection + brain-like processing" new perception mode in long-distance electro-optical surveillance systems.
- Tiny target /
- Target detection /
- Spiking Neural Network /
- Dynamic Vision

HTML全文

图 1 动态视觉探测的目标形态

下载: 全尺寸图片幻灯片

图 2 本文网络总体架构

下载: 全尺寸图片幻灯片

图 3 脉冲Swin Transformer模块结构图

下载: 全尺寸图片幻灯片

图 4 局部滑窗脉冲自注意力机制(WSSA)结构图

下载: 全尺寸图片幻灯片

图 5 脉冲ConvLSTM神经元结构图

下载: 全尺寸图片幻灯片

图 6 脉冲金字塔模块结构图

下载: 全尺寸图片幻灯片

图 7 微小运动目标数据集

下载: 全尺寸图片幻灯片

图 8 随机噪声点影响下不同方法的检测结果(示例一)

下载: 全尺寸图片幻灯片

图 9 随机噪声点影响下不同方法检测结果(示例二)

下载: 全尺寸图片幻灯片

表 1 不同目标检测算法性能对比

方法	Re(%)	Pr(%)	F1(%)	Fa($ \times 1{0}^{-6} $)
基于VGG的短时域脉冲神经网络	89.6	82.1	85.7	2.17
基于SST的短时域脉冲神经网络	94.4	84.3	89.1	1.95
本文方法	95.7	85.2	90.1	1.85

下载: 导出CSV

表 2 不同网络深度及不同事件时长的算法性能对比

方法($ n $,$ N $)	Re(%)	Pr(%)	F1(%)	Fa($ \times 1{0}^{-6} $)	参数量
本文方法 (1, 5)	90.4	82.0	86.0	2.21	5.9M
本文方法 (2, 5)	93.8	81.3	87.1	2.41	6.0M
本文方法 (1, 8)	94.2	82.8	88.1	2.18	5.9M
本文方法 (2, 8)	95.2	86.1	90.4	1.70	6.0M
本文方法 (1,10)	95.7	85.2	90.1	1.85	5.9M
本文方法 (2,10)	95.8	85.4	90.3	1.82	6.0M
基于SST的脉冲神经网络	94.4	84.3	89.1	1.95	4.1M

下载: 导出CSV

参考文献(25)

[1]	LI Ruojing, AN Wei, XIAO Chao, et al. Direction-coded temporal U-shape module for multiframe infrared small target detection[J]. IEEE Transactions on Neural Networks and Learning Systems, 2025, 36(1): 555–568. doi: 10.1109/TNNLS.2023.3331004.
[2]	FILHO W L, ABUBAKAR I R, HUNT J D, et al. Managing space debris: Risks, mitigation measures, and sustainability challenges[J]. Sustainable Futures, 2025, 10: 100849. doi: 10.1016/j.sftr.2025.100849.
[3]	LI Boyang, XIAO Chao, WANG Longguang, et al. Dense nested attention network for infrared small target detection[J]. IEEE Transactions on Image Processing, 2023, 32: 1745–1758. doi: 10.1109/TIP.2022.3199107.
[4]	李朝旭, 徐清宇, 安玮, 等. 红外图像暗弱目标轻量级检测网络[J]. 红外与毫米波学报, 2025, 44(2): 299–310. doi: 10.11972/j.issn.1001-9014.2025.02.017. LI Zhaoxu, XU Qingyu, AN Wei, et al. A lightweight dark object detection network for infrared images[J]. Journal of Infrared and Millimeter Waves, 2025, 44(2): 299–310. doi: 10.11972/j.issn.1001-9014.2025.02.017.
[5]	WANG Hongxin, WANG Huatian, ZHAO Jiannan, et al. A time-delay feedback neural network for discriminating small, fast-moving targets in complex dynamic environments[J]. IEEE Transactions on Neural Networks and Learning Systems, 2023, 34(1): 316–330. doi: 10.1109/TNNLS.2021.3094205.
[6]	ZHU Yabin, LI Chenglong, LIU Yao, et al. Tiny object tracking: A large-scale dataset and a baseline[J]. IEEE Transactions on Neural Networks and Learning Systems, 2024, 35(8): 10273–10287. doi: 10.1109/TNNLS.2023.3239529.
[7]	丁翔, 乔凯. 基于多物理场耦合的空中目标红外探测多参数联合寻优方法[J]. 红外与毫米波学报, 2025, 44(3): 444–445. doi: 10.11972/j.issn.1001-9014.2025.03.014. DING Xiang and QIAO Kai. Multi-physics coupling-based multi-parameter joint optimization technique for aerial target infrared detection[J]. Journal of Infrared and Millimeter Waves, 2025, 44(3): 444–445. doi: 10.11972/j.issn.1001-9014.2025.03.014.
[8]	谷雨, 张宏宇, 孙仕成. 融合多尺度分形注意力的红外小目标检测模型[J]. 电子与信息学报, 2023, 45(8): 3002–3011. doi: 10.11999/JEIT220919. GU Yu, ZHANG Hongyu, and SUN Shicheng. Infrared small target detection model with multi-scale fractal attention[J]. Journal of Electronics & Information Technology, 2023, 45(8): 3002–3011. doi: 10.11999/JEIT220919.
[9]	李淼, 陈诺, 安玮, 等. 面向事件相机探测无人机的双视图融合检测方法[J]. 光电工程, 2024, 51(11): 240208. doi: 10.12086/oee.2024.240208. LI Miao, CHEN Nuo, AN Wei, et al. Dual view fusion detection method for event camera detection of unmanned aerial vehicles[J]. Opto-Electronic Engineering, 2024, 51(11): 240208. doi: 10.12086/oee.2024.240208.
[10]	CHEN Nuo, ZHANG Chushu, AN Wei, et al. Event-based motion deblurring with blur-aware reconstruction filter[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2025, 35(9): 8508–8519. doi: 10.1109/TCSVT.2025.3551516.
[11]	GEHRIG D and SCARAMUZZA D. Low-latency automotive vision with event cameras[J]. Nature, 2024, 629(8014): 1034–1040. doi: 10.1038/s41586-024-07409-w.
[12]	LI Zhengqi, NIKLAUS S, SNAVELY N, et al. Neural scene flow fields for space-time view synthesis of dynamic scenes[C]. The 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, USA, 2021: 6498–6508. doi: 10.1109/CVPR46437.2021.00643.
[13]	MITROKHIN A, HUA Zhiyuan, FERMÜLLER C, et al. Learning visual motion segmentation using event surfaces[C]. The 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, USA, 2020: 14414–14423. doi: 10.1109/CVPR42600.2020.01442.
[14]	SCHAEFER S, GEHRIG D, and SCARAMUZZA D. AEGNN: Asynchronous event-based graph neural networks[C]. The 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, USA, 2022: 12371–12381. doi: 10.1109/CVPR52688.2022.01205.
[15]	MAQUEDA A I, LOQUERCIO A, GALLEGO G, et al. Event-based vision meets deep learning on steering prediction for self-driving cars[C]. The 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, USA, 2018: 5419–5427. doi: 10.1109/CVPR.2018.00568.
[16]	ZHU A Z, YUAN Liangzhe, CHANEY K, et al. Unsupervised event-based learning of optical flow, depth, and egomotion[C]. The 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, USA, 2019: 989–997. doi: 10.1109/CVPR.2019.00108.
[17]	CAI Zongyuan and LI Xinze. Neuromorphic brain-inspired computing with hybrid neural networks[C]. 2021 IEEE International Conference on Artificial Intelligence and Industrial Design, Guangzhou, China, 2021: 343–347. doi: 10.1109/AIID51893.2021.9456483.
[18]	刘浩, 柴洪峰, 孙权, 等. 脉冲神经网络研究现状与应用进展[J]. 中国工程科学, 2023, 25(6): 61–79. doi: 10.15302/J-SSCAE-2023.06.011. LIU Hao, CHAI Hongfeng, SUN Quan, et al. A review of recent advances and application for spiking neural networks[J]. Strategic Study of CAE, 2023, 25(6): 61–79. doi: 10.15302/J-SSCAE-2023.06.011.
[19]	EL MAACHI S, CHEHRI A, and SAADANE R. Efficient hardware acceleration of spiking neural networks using FPGA: Towards real-time edge neuromorphic computing[C]. IEEE 99th Vehicular Technology Conference, Singapore, Singapore, 2024: 1–5. doi: 10.1109/VTC2024-Spring62846.2024.10683049.
[20]	BODDEN L, HA D B, SCHWAIGER F, et al. Spiking CenterNet: A distillation-boosted spiking neural network for object detection[C]. 2024 International Joint Conference on Neural Networks (IJCNN), Yokohama, Japan, 2024: 1–9. doi: 10.1109/IJCNN60899.2024.10650418.
[21]	CHEN Nuo, LI Boyang, WANG Yingqian, et al. Motion and appearance decoupling representation for event cameras[J]. IEEE Transactions on Image Processing, 2025, 34: 5964–5977. doi: 10.1109/TIP.2025.3607632.
[22]	CHEN Nuo, XIAO Chao, DAI Yimian, et al. Event-based tiny object detection: A benchmark dataset and baseline[EB/OL]. https://arxiv.org/abs/2506.23575, 2025.
[23]	LI Ruojing, AN Wei, WANG Yingqian, et al. Probing deep into temporal profile makes the infrared small target detector much better[EB/OL]. https://arxiv.org/abs/2506.12766, 2025.
[24]	SHI Yangsi, LI Miao, CHEN Nuo, et al. Sparse-gated RGB-event fusion for small object detection in the wild[J]. Remote Sensing, 2025, 17(17): 3112. doi: 10.3390/rs17173112.
[25]	ZHANG Heng, CHEN Nuo, LI Miao, et al. Spiking swin transformer for UAV object detection based on event cameras[C]. The 12th International Conference on Information Systems and Computing Technology (ISCTech), Xi’an, China, 2024: 1–6. doi: 10.1109/ISCTech63666.2024.10845340.