高级搜索

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

UMM-Det:面向异构多模态遥感影像的一体化目标检测框架

邹旻瑞 李宇轩 戴一冕 李翔 程明明

邹旻瑞, 李宇轩, 戴一冕, 李翔, 程明明. UMM-Det:面向异构多模态遥感影像的一体化目标检测框架[J]. 电子与信息学报. doi: 10.11999/JEIT250933
引用本文: 邹旻瑞, 李宇轩, 戴一冕, 李翔, 程明明. UMM-Det:面向异构多模态遥感影像的一体化目标检测框架[J]. 电子与信息学报. doi: 10.11999/JEIT250933
ZOU Minrui, LI Yuxuan, DAI Yimian, LI Xiang, CHENG Mingming. UMM-Det: A Unified Object Detection Framework for Heterogeneous Multi-Modal Remote Sensing Imagery[J]. Journal of Electronics & Information Technology. doi: 10.11999/JEIT250933
Citation: ZOU Minrui, LI Yuxuan, DAI Yimian, LI Xiang, CHENG Mingming. UMM-Det: A Unified Object Detection Framework for Heterogeneous Multi-Modal Remote Sensing Imagery[J]. Journal of Electronics & Information Technology. doi: 10.11999/JEIT250933

UMM-Det:面向异构多模态遥感影像的一体化目标检测框架

doi: 10.11999/JEIT250933 cstr: 32379.14.JEIT250933
基金项目: 国家杰出青年科学基金(62225604),国家自然科学基金(62301261, 62206134),深圳市自然科学基金面上项目(JCYJ20240813114237048) ,天津市自然科学基金资助项目(25JCQNJC01370),南开大学超算中心
详细信息
    作者简介:

    邹旻瑞:男,博士生,研究方向为遥感目标检测等

    李宇轩:男,博士生,研究方向为遥感目标检测等

    戴一冕:男,副教授,研究方向为计算机视觉、遥感目标检测等

    李翔:男,副教授,研究方向为计算机视觉、图像识别与检测等

    程明明:男,教授,研究方向为人工智能、计算机视觉等

    通讯作者:

    戴一冕 yimian.dai@gmail.com

  • 中图分类号: TP751; TP181

UMM-Det: A Unified Object Detection Framework for Heterogeneous Multi-Modal Remote Sensing Imagery

Funds: The National Science Fund for Distinguished Young Scholar (62225604), The National Natural Science Foundation of China (62301261, 62206134), The General Program of Shenzhen Natural Science Foundation (JCYJ20240813114237048), Natural Science Foundation of Tianjin(No.25JCQNJC01370), The Supercomputing Center of Nankai University (NKSC)
  • 摘要: 当前天基遥感目标检测任务面临着如何构建一个统一模型以有效处理合成孔径雷达(Synthetic Aperture Radar, SAR)、可见光、红外等多模态异构数据的挑战。针对此,该文提出一种异构多模态遥感影像一体化目标检测框架UMM-Det(Unified Multi-Modal Detector),致力于通过单一架构实现对多源数据的高性能目标检测。该框架采用单一共享架构,旨在实现对多源遥感数据的高效、统一检测。UMM-Det在基线模型SM3Det的基础上进行三点关键改进:首先,以具备动态采样与大感受野建模能力的InternImage替换原有ConvNeXt主干,旨在提升对多尺度、低对比度目标的特征提取精度;其次,针对红外分支设计了基于时序信息的时空视觉提示模块,通过精细化的帧差增强策略生成高对比度的运动特征,以此作为先验知识辅助网络区分动态弱小目标;最后,针对红外序列中普遍存在的弱小目标正负样本极度不均衡问题,引入概率性锚框分配策略(Probabilistic Anchor Assignment, PAA)优化检测头,显著提升了目标采样的精确性与检测性能。在SARDet-50K、DOTA与SatVideoIRSTD三个公开数据集上的实验表明,UMM-Det在SAR与可见光检测任务中 mAP@0.5:0.95 分别提升 2.40% 和 1.77%,并且在红外序列弱小目标检测任务中较基线模型SM3Det将检测率提升了2.54%。同时,该模型在保证精度提升的前提下将参数量减少50%以上,展现出精度、效率与轻量化的综合优势,为新一代高性能天基遥感态统一检测框架的构建提供了有效路径。
  • 图  1  UMM-Det 网络结构图(图中展示的多个优化器节点代表不同检测任务的梯度回传路径)

    图  2  时空视觉提示模块示意图

    图  3  在红外序列小目标检测 SatVideoIRSTD 数据集上的可视化

    表  1  不同模块的消融实验结果

    InternImage骨干 PAA检测头 时空视觉
    提示模块
    Pd Fa
    77.13% 3.24e–4
    77.93% 5.34e–4
    78.44% 8.24e–4
    79.67% 6.61e–4
    下载: 导出CSV

    表  2  不同基线网络在三个模态数据集上的实验结果

    SARDet-50KDOTASatVideoIRSTD计算量参数量
    mAP@0.5:0.95mAP@0.5mAP@0.5:0.95mAP@0.5PdFa
    RetinaNet
    [27]
    53.0483.99--66.551.19e-4520.74G206.69M
    Faster RCNN[28]54.5685.62--45.057.99e-5435.69G173.55M
    Cascade RCNN[29]56.3085.39--58.068.44e-5463.44G201.30M
    GFL[30]59.0188.77--72.513.51e-4733.85G274.95M
    RoI Transformer[31]--45.4376.79--520.74G206.69M
    S2ANet[32]--39.9276.20--463.44G201.30M
    VAN-T[24]49.2880.8543.6074.7370.523.05e-4270.47G45.32M
    VAN-S[24]57.9888.3645.5076.6674.844.94e-4366.56G64.87M
    LSKNet-T[25]49.9581.7643.5675.4470.813.51e-4269.38G45.03M
    LSKNet-S[25]58.4188.4844.8076.6974.516.13e-4369.67G65.37M
    PVT-v2-T[26]48.5880.7142.7275.3971.943.88e-4236.92G40.20M
    PVT-v2-S[26]54.5385.4844.3777.5375.196.83e-4293.87G51.45M
    SM3Det[15]60.6489.9446.4777.8877.133.24e-4741.29G164.29M
    UMM-Det63.0491.5548.2480.9179.676.61e-4977.31G76.64M
    下载: 导出CSV
  • [1] 安成锦, 杨俊刚, 梁政宇, 等. 阵列相机图像邻近目标超分辨方法[J]. 电子与信息学报, 2023, 45(11): 4050–4059. doi: 10.11999/JEIT230810.

    AN Chengjin, YANG Jungang, LIANG Zhengyu, et al. Closely spaced objects super-resolution method using array camera images[J]. Journal of Electronics & Information Technology, 2023, 45(11): 4050–4059. doi: 10.11999/JEIT230810.
    [2] 杨俊刚, 刘婷, 刘永贤, 等. 基于非凸低秩塔克分解的红外小目标检测方法[J]. 红外与毫米波学报, 2025, 44(2): 311–325. doi: 10.11972/j.issn.1001-9014.2025.02.018.

    YANG Jungang, LIU Ting, LIU Yongxian, et al. Infrared small target detection method based on nonconvex low-rank Tuck decomposition[J]. Journal of Infrared and Millimeter Waves, 2025, 44(2): 311–325. doi: 10.11972/j.issn.1001-9014.2025.02.018.
    [3] 林再平, 罗伊杭, 李博扬, 等. 基于梯度可感知通道注意力模块的红外小目标检测前去噪网络[J]. 红外与毫米波学报, 2024, 43(2): 254–260. doi: 10.11972/j.issn.1001-9014.2024.02.015.

    LIN Zaiping, LUO Yihang, LI Boyang, et al. Gradient-aware channel attention network for infrared small target image denoising before detection[J]. Journal of Infrared and Millimeter Waves, 2024, 43(2): 254–260. doi: 10.11972/j.issn.1001-9014.2024.02.015.
    [4] SHI Qian, HE Da, LIU Zhengyu, et al. Globe230k: A benchmark dense-pixel annotation dataset for global land cover mapping[J]. Journal of Remote Sensing, 2023, 3: 0078. doi: 10.34133/remotesensing.0078.
    [5] TIAN Jiaqi, ZHU Xiaolin, SHEN Miaogen, et al. Effectiveness of spatiotemporal data fusion in fine-scale land surface phenology monitoring: A simulation study[J]. Journal of Remote Sensing, 2024, 4: 0118. doi: 10.34133/remotesensing.0118.
    [6] LIU Shuaijun, LIU Jia, TAN Xiaoyue, et al. A hybrid spatiotemporal fusion method for high spatial resolution imagery: Fusion of gaofen-1 and sentinel-2 over agricultural landscapes[J]. Journal of Remote Sensing, 2024, 4: 0159. doi: 10.34133/remotesensing.0159.
    [7] MEI Shaohui, LIAN Jiawei, WANG Xiaofei, et al. A comprehensive study on the robustness of deep learning-based image classification and object detection in remote sensing: Surveying and benchmarking[J]. Journal of Remote Sensing, 2024, 4: 0219. doi: 10.34133/remotesensing.0219.
    [8] GUO Xin, LAO Jiangwei, DANG Bo, et al. SkySense: A multi-modal remote sensing foundation model towards universal interpretation for earth observation imagery[C]. 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, USA, 2024: 27662–27673. doi: 10.1109/CVPR52733.2024.02613.
    [9] ZHANG Yingying, RU Lixiang, Wu Kang, et al. SkySense V2: A unified foundation model for multi-modal remote sensing[J]. arXiv: 2507.13812, 2025. doi: 10.48550/arXiv.2507.13812. (查阅网上资料,不确定本文献类型是否正确,请确认).
    [10] BI Hanbo, FENG Yingchao, TONG Boyuan, et al. RingMoE: Mixture-of-modality-experts multi-modal foundation models for universal remote sensing image interpretation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025: 1–18. doi: 10.1109/tpami.2025.3643453. (查阅网上资料,未找到本条文献出版年和卷期信息,请确认).
    [11] LI Xuyang, LI Chenyu, GHAMISI P, et al. FlexiMo: A flexible remote sensing foundation model[J]. arXiv: 2503.23844, 2025. doi: 10.48550/arXiv.2503.23844. (查阅网上资料,不确定本文献类型是否正确,请确认).
    [12] YAO Kelu, XU Nuo, YANG Rong, et al. Falcon: A remote sensing vision-language foundation model (Technical Report)[J]. arXiv: 2503.11070, 2025. doi: 10.48550/arXiv.2503.11070. (查阅网上资料,不确定本文献类型是否正确,请确认).
    [13] QIN Xiaolei, WANG Di, ZHANG Jing, et al. TiMo: Spatiotemporal foundation model for satellite image time series[J]. arXiv: 2505.08723, 2025. doi: 10.48550/arXiv.2505.08723. (查阅网上资料,不确定本文献类型是否正确,请确认).
    [14] YAO Liang, LIU Fan, CHEN Delong, et al. RemoteSAM: Towards segment anything for earth observation[C]. Proceedings of the 33rd ACM International Conference on Multimedia, Dublin, Ireland, 2025: 3027–3036. doi: 10.1145/3746027.3754950.
    [15] LI Yuxuan, LI Xiang, LI Yunheng, et al. SM3Det: A unified model for multi-modal remote sensing object detection[J]. arXiv: 2412.20665, 2024. doi: 10.48550/arXiv.2412.20665. (查阅网上资料,不确定本文献类型是否正确,请确认).
    [16] WANG Wenhai, DAI Jifeng, CHEN Zhe, et al. InternImage: Exploring large-scale vision foundation models with deformable convolutions[C]. 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, Canada, 2023: 14408–14419. doi: 10.1109/CVPR52729.2023.01385.
    [17] LIU Zhuang, MAO Hanzi, WU Chaoyuan, et al. A ConvNet for the 2020s[C]. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, USA, 2022: 11966–11976. doi: 10.1109/CVPR52688.2022.01167.
    [18] KIM K and LEE H S. Probabilistic anchor assignment with IoU prediction for object detection[C]. Proceedings of 16th European Conference on Computer Vision – ECCV 2020, Glasgow, UK, 2020: 355–371. doi: 10.1007/978-3-030-58595-2_22.
    [19] LI Yuxuan, LI Xiang, LI Weijie, et al. SARDet-100K: Towards open-source benchmark and toolkit for large-scale SAR object detection[C]. Proceedings of the 38th International Conference on Neural Information Processing Systems, Vancouver, Canada, 2024: 4079.
    [20] XIA Guisong, BAI Xiang, DING Jian, et al. DOTA: A large-scale dataset for object detection in aerial images[C]. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, USA, 2018: 3974–3983. doi: 10.1109/CVPR.2018.00418.
    [21] LI Ruojing, AN Wei, YING Xinyi, et al. Probing deep into temporal profile makes the infrared small target detector much better[J]. arXiv: 2506.12766, 2025. doi: 10.48550/arXiv.2506.12766. (查阅网上资料,不确定本文献类型是否正确,请确认).
    [22] 李朝旭, 徐清宇, 安玮, 等. 红外图像暗弱目标轻量级检测网络[J]. 红外与毫米波学报, 2025, 44(2): 299–310. doi: 10.11972/j.issn.1001-9014.2025.02.017.

    LI Zhaoxu, XU Qingyu, AN Wei, et al. A lightweight dark object detection network for infrared images[J]. Journal of Infrared and Millimeter Waves, 2025, 44(2): 299–310. doi: 10.11972/j.issn.1001-9014.2025.02.017.
    [23] YING Xinyi, LIU Li, LIN Zaipin, et al. Infrared small target detection in satellite videos: A new dataset and a novel recurrent feature refinement framework[J]. IEEE Transactions on Geoscience and Remote Sensing, 2025, 63: 5002818. doi: 10.1109/TGRS.2025.3542368.
    [24] GUO Menghao, LU Chengze, LIU Zhengning, et al. Visual attention network[J]. Computational Visual Media, 2023, 9(4): 733–752. doi: 10.1007/s41095-023-0364-2.
    [25] LI Yuxuan, HOU Qibin, ZHENG Zhaohui, et al. Large selective kernel network for remote sensing object detection[C]. 2023 IEEE/CVF International Conference on Computer Vision (ICCV), Paris, France, 2023: 16748–16759. doi: 10.1109/ICCV51070.2023.01540.
    [26] WANG Wenhai, XIE Enze, LI Xiang, et al. PVT v2: Improved baselines with pyramid vision transformer[J]. Computational Visual Media, 2022, 8(3): 415–424. doi: 10.1007/s41095-022-0274-8.
    [27] LIN T Y, GOYAL P, GIRSHICK R, et al. Focal loss for dense object detection[C]. 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 2017: 2999–3007. doi: 10.1109/ICCV.2017.324.
    [28] REN Shaoqing, HE Kaiming, GIRSHICK R, et al. Faster R-CNN: Towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6): 1137–1149. doi: 10.1109/TPAMI.2016.2577031.
    [29] CAI Zhaowei, VASCONCELOS N. Cascade R-CNN: Delving into high quality object detection[C]. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, USA, 2018: 6154–6162. doi: 10.1109/CVPR.2018.00644.
    [30] LI Xiang, WANG Wenhai, WU Lijun, et al. Generalized focal loss: Learning qualified and distributed bounding boxes for dense object detection[C]. Proceedings of the 34th International Conference on Neural Information Processing Systems, Vancouver, Canada, 2020: 1763.
    [31] DING Jian, XUE Nan, LONG Yang, et al. Learning RoI transformer for oriented object detection in aerial images[C]. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, USA, 2019: 2844–2853. doi: 10.1109/CVPR.2019.00296.
    [32] LIU Yujie, SUN Xiaorui, SHAO Wenbin, et al. S2ANet: Combining local spectral and spatial point grouping for point cloud processing[J]. Virtual Reality & Intelligent Hardware, 2024, 6(4): 267–279. doi: 10.1016/j.vrih.2023.06.005.
  • 加载中
图(3) / 表(2)
计量
  • 文章访问数:  51
  • HTML全文浏览量:  20
  • PDF下载量:  9
  • 被引次数: 0
出版历程
  • 修回日期:  2026-01-04
  • 录用日期:  2026-01-04
  • 网络出版日期:  2026-01-10

目录

    /

    返回文章
    返回