高级搜索

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

语义引导的多尺度深度展开一体化遥感图像融合方法

陈俊杰 汪婷婷 方发明 张桂戌

陈俊杰, 汪婷婷, 方发明, 张桂戌. 语义引导的多尺度深度展开一体化遥感图像融合方法[J]. 电子与信息学报. doi: 10.11999/JEIT251252
引用本文: 陈俊杰, 汪婷婷, 方发明, 张桂戌. 语义引导的多尺度深度展开一体化遥感图像融合方法[J]. 电子与信息学报. doi: 10.11999/JEIT251252
CHEN Junjie, WANG Tingting, FANG Faming, ZHANG Guixu. Semantic-guided Unified Multi-scale Deep Unrolling Network for Pansharpening[J]. Journal of Electronics & Information Technology. doi: 10.11999/JEIT251252
Citation: CHEN Junjie, WANG Tingting, FANG Faming, ZHANG Guixu. Semantic-guided Unified Multi-scale Deep Unrolling Network for Pansharpening[J]. Journal of Electronics & Information Technology. doi: 10.11999/JEIT251252

语义引导的多尺度深度展开一体化遥感图像融合方法

doi: 10.11999/JEIT251252 cstr: 32379.14.JEIT251252
基金项目: 科技创新2030--“新一代人工智能”重大项目(2022ZD0161800),国家自然科学基金项目(62202173, 62271203),华东师范大学KLATASDS-MOE开放研究基金
详细信息
    作者简介:

    陈俊杰:男,硕士生,研究方向为遥感图像融合

    汪婷婷:女,博士后,研究方向为图像增强与图像融合

    方发明:男,教授,研究方向为机器学习、图像处理

    张桂戌:男,教授,研究方向为图像处理与模式识别

    通讯作者:

    张桂戌 gxzhang@cs.ecnu.edu.cn

  • 中图分类号: TP751

Semantic-guided Unified Multi-scale Deep Unrolling Network for Pansharpening

Funds: The National Key Research and Development Program of China (2022ZD0161800), The National Natural Science Foundation of China, The Open Research Fund of KLATASDS-MOE, ECNU
  • 摘要: 现有的基于深度学习的遥感图像融合方法通常依赖于特定数据集进行训练,导致其泛化能力不足,难以适应多卫星场景的实际应用。为此,本文提出一种语义引导的一体化多尺度深度展开网络(SUM-DUN)。该网络基于传统融合问题的优化求解进行设计,采用3D架构以兼容不同波段数量的多光谱(MS)图像输入。通过引入多尺度特征分层处理机制,SUM-DUN能够有效提取并融合不同层级的特征。更重要的是,为实现一体化融合,本文创新性地引入多模态大语言模型,从输入的低分辨率多光谱(LRMS)图像与全色(PAN)图像中生成通用语义文本提示,以动态引导网络自适应地选择最优特征传递策略。多卫星实验结果表明,所提方法在多个数据集上的主观视觉效果和客观评价指标均得到显著提升。
  • 图  1  算法流程框架图

    图  2  梯度下降模块和近端网络结构图

    图  3  GF-1测试数据集的降分辨率实验融合结果和残差图

    图  4  WV-2测试数据集的降分辨率实验融合结果和残差图

    图  5  GF-1测试数据集的全分辨率实验融合结果

    图  6  SPIM输入输出特征t-SNE可视化结果和对应语义示例

    表  1  实验部分所使用的数据集信息

    卫星空间分辨率(m)图像尺寸训练集数量验证集数量降分辨率
    测试集数量
    全分辨率
    测试集数量
    MSPANMSPAN
    GF-10.82.032×32×4128×128138615410064
    QB2.440.6132×32×4128×128171019010064
    WV-41.20.332×32×8128×128171019010064
    WV-22.00.532×32×4128×128171019010064
    下载: 导出CSV

    表  2  GF-1测试数据集的定量比较

    方法 降分辨率 全分辨率
    $ {Q}_{4} $ QAVE SAM(rad) ERGAS SCC PSNR(dB) $ {D}_{\lambda } $ $ {D}_{S} $ HQNR
    BDSD 0.7526 0.7874 1.9305 1.5777 0.9446 39.6160 0.0415 0.0468 0.9139
    PRACS 0.7314 0.7622 1.8491 1.5341 0.9377 39.4899 0.0632 0.1679 0.7810
    AWFLN 0.9292 0.9394 0.6171 0.6377 0.9909 49.7031 0.0145 0.1600 0.8280
    FusionMamba 0.9199 0.9368 0.6155 0.6418 0.9918 49.9123 0.0180 0.1605 0.8246
    PanMamba 0.9502 0.9572 0.4932 0.5391 0.9940 51.4618 0.0156 0.1632 0.8239
    WFANet 0.9443 0.9550 0.4729 0.5290 0.9947 51.9493 0.0115 0.1599 0.8307
    TMDiff 0.9350 0.9410 0.5481 0.6313 0.9924 50.4383 0.0305 0.1939 0.7818
    本文方法 0.9539 0.9614 0.4366 0.5104 0.9953 52.5321 0.0103 0.0816 0.9089
    下载: 导出CSV

    表  3  WV-2测试数据集的定量比较

    方法 降分辨率 全分辨率
    $ {Q}_{8} $ QAVE SAM(rad) ERGAS SCC PSNR(dB) $ {D}_{\lambda } $ $ {D}_{S} $ HQNR
    BDSD 0.6914 0.7031 4.9777 4.3408 0.9434 36.7286 0.0525 0.1465 0.8078
    PRACS 0.6624 0.6677 4.7022 4.8873 0.9121 35.9620 0.0147 0.1145 0.8727
    AWFLN 0.9133 0.9143 0.8205 0.6002 0.9914 49.1352 0.0191 0.0933 0.8893
    FusionMamba 0.9082 0.9120 0.8584 0.6179 0.9917 49.0684 0.0215 0.0635 0.9164
    PanMamba 0.7613 0.7606 2.7455 2.5952 0.9839 41.0233 0.0533 0.0897 0.8621
    WFANet 0.7608 0.7599 2.7611 2.6042 0.9838 40.9669 0.0457 0.0821 0.8762
    TMDiff 0.7561 0.7548 2.7895 2.6382 0.9831 40.8775 0.0567 0.0533 0.8931
    本文方法 0.7675 0.7673 2.6959 2.4934 0.9855 41.3410 0.0154 0.0288 0.9562
    下载: 导出CSV

    表  4  网络架构消融实验PSNR指标结果(单位:dB)

    编号 退化算子 近端网络 多尺度架构 GF-1 QB WV-2 WV-4
    2D 2D 52.4677 50.0497 40.9120 43.5865
    3D 2D 52.5387 50.0891 40.9077 43.7566
    3D 3D 52.3679 50.1103 41.2707 43.9447
    3D 3D × 51.7431 50.0722 41.2601 43.7500
    下载: 导出CSV

    表  5  提示引导机制消融实验PSNR指标结果(单位:dB)

    编号方法GF-1QBWV-2WV-4
    通道52.388550.151941.296344.0624
    空间52.480350.157841.329544.1117
    空间–通道52.532150.262941.341044.2020
    交叉注意力51.307949.787341.022843.5290
    下载: 导出CSV
  • [1] THOMAS C, RANCHIN T, WALD L, et al. Synthesis of multispectral images to high spatial resolution: A critical review of fusion methods based on remote sensing physics[J]. IEEE Transactions on Geoscience and Remote Sensing, 2008, 46(5): 1301–1312. doi: 10.1109/TGRS.2007.912448.
    [2] 金晶, 王峰. 分布式多卫星协同遥感图像场景分类方法[J]. 电子与信息学报, 2025, 47(12): 4677–4688. doi: 10.11999/JEIT250866.

    JIN Jing and WANG Feng. A distributed multi-satellite collaborative framework for remote sensing scene classification[J]. Journal of Electronics & Information Technology, 2025, 47(12): 4677–4688. doi: 10.11999/JEIT250866.
    [3] 文泓力, 胡庆浩, 黄立威, 等. 基于参数高效ViT与多模态导引的遥感图像小样本分类方法[J]. 电子与信息学报, 2025, 47(12): 4689–4703. doi: 10.11999/JEIT250996.

    WEN Hongli, HU Qinghao, HUANG Liwei, et al. Few-shot remote sensing image classification based on parameter-efficient vision transformer and multimodal guidance[J]. Journal of Electronics & Information Technology, 2025, 47(12): 4689–4703. doi: 10.11999/JEIT250996.
    [4] 韩汶杞, 蒋雯, 耿杰, 等. 原型对齐与拓扑一致性约束下的多模态半监督遥感图像语义分割[J]. 电子与信息学报, 2025, 47(12): 4714–4727. doi: 10.11999/JEIT251115.

    HAN Wenqi, JIANG Wen, GENG Jie, et al. PATC: Prototype alignment and topology-consistent pseudo-supervision for multimodal semi-supervised semantic segmentation of remote sensing images[J]. Journal of Electronics & Information Technology, 2025, 47(12): 4714–4727. doi: 10.11999/JEIT251115.
    [5] ZENG Delu, HU Yuwen, HUANG Yue, et al. Pan-sharpening with structural consistency and ℓ1/2 gradient prior[J]. Remote Sensing Letters, 2016, 7(12): 1170–1179. doi: 10.1080/2150704X.2016.1222098.
    [6] WU Zhongcheng, HUANG Tingzhu, DENG Liangjian, et al. VO+Net: An adaptive approach using variational optimization and deep learning for panchromatic sharpening[J]. IEEE Transactions on Geoscience and Remote Sensing, 2022, 60: 5401016. doi: 10.1109/TGRS.2021.3066425.
    [7] LU Hangyuan, YANG Yong, HUANG Shuying, et al. AWFLN: An adaptive weighted feature learning network for pansharpening[J]. IEEE Transactions on Geoscience and Remote Sensing, 2023, 61: 5400815. doi: 10.1109/TGRS.2023.3241643.
    [8] XIE Xinyu, CUI Yawen, TAN Tao, et al. FusionMamba: Dynamic feature enhancement for multimodal image fusion with mamba[J]. Visual Intelligence, 2024, 2(1): 37. doi: 10.1007/s44267-024-00072-9.
    [9] HE Xuanhua, CAO Ke, ZHANG Jie, et al. Pan-mamba: Effective pan-sharpening with state space model[J]. Information Fusion, 2025, 115: 102779. doi: 10.1016/j.inffus.2024.102779.
    [10] HUANG Jie, HUANG Rui, XU Jinghao, et al. Wavelet-assisted multi-frequency attention network for pansharpening[C]. Proceedings of the 39th AAAI Conference on Artificial Intelligence, Philadelphia, USA, 2025: 3662–3670. doi: 10.1609/aaai.v39i4.32381.
    [11] JIA Menglin, TANG Luming, CHEN B C, et al. JIA Menglin, TANG Luming, CHEN B C, et al. Visual prompt tuning[C]. 17th European Conference on Computer Vision, Tel-Aviv, Israel, 2022: 709–727. doi: 10.1007/978-3-031-19827-4_41.
    [12] NIE Xing, NI Bolin, CHANG Jianlong, et al. Pro-tuning: Unified prompt tuning for vision tasks[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2024, 34(6): 4653–4667. doi: 10.1109/TCSVT.2023.3327605.
    [13] CUI Yuning, ZAMIR S W, KHAN S, et al. AdaIR: Adaptive all-in-one image restoration via frequency mining and modulation[C]. The Thirteenth International Conference on Learning Representations, Singapore, Singapore, 2025: 57335–57356. (查阅网上资料, 未找到本条文献页码信息, 请确认).
    [14] ZENG Haijin, WANG Xiangming, CHEN Yongyong, et al. Vision-language gradient descent-driven all-in-one deep unfolding networks[C]. Proceedings of the 2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, United States, 2025: 7524–7533. doi: 10.1109/CVPR52734.2025.00705.
    [15] YANG Zhiwen, CHEN Haowei, QIAN Ziniu, et al. All-in-one medical image restoration via task-adaptive routing[C]. 27th International Conference on Medical Image Computing and Computer Assisted Intervention, Marrakesh, Morocco, 2024: 67–77. doi: 10.1007/978-3-031-72104-5_7.
    [16] XING Yinghui, QU Litao, ZHANG Shizhou, et al. Empower generalizability for pansharpening through text-modulated diffusion model[J]. IEEE Transactions on Geoscience and Remote Sensing, 2024, 62: 5633812. doi: 10.1109/TGRS.2024.3434431.
    [17] LI Xueheng, HE Xuanhua, CAO Ke, et al. Exploring text-guided information fusion through chain-of-reasoning for pansharpening[J]. IEEE Transactions on Geoscience and Remote Sensing, 2025, 63: 5407314. doi: 10.1109/TGRS.2025.3604447.
    [18] FANG Shijie and GAN Hongping. SSUN-net: Spatial-spectral prior-aware unfolding network for pan-sharpening[C]. Proceedings of the 39th AAAI Conference on Artificial Intelligence, Philadelphia, United States, 2025: 2897–2905. doi: 10.1609/aaai.v39i3.32296.
    [19] YAN Qiuhai, JIANG Aiwen, CHEN Kang, et al. Textual prompt guided image restoration[J]. Engineering Applications of Artificial Intelligence, 2025, 155: 110981. doi: 10.1016/j.engappai.2025.110981.
    [20] CONDE M V, GEIGLE G, and TIMOFTE R. InstructIR: High-quality image restoration following human instructions[C]. 18th European Conference on Computer Vision, Milan, Italy, 2024: 1–21. doi: 10.1007/978-3-031-72764-1_1.
    [21] ZENG Aohan, XU Bin, WANG Bowen, et al. ChatGLM: A family of large language models from GLM-130B to GLM-4 all tools[EB/OL]. https://arxiv.org/abs/2406.12793, 2024.
    [22] XIAO Shitao, LIU Zheng, ZHANG Peitian, et al. C-pack: Packed resources for general Chinese embeddings[C]. Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval, Washington, USA, 2024: 641–649. doi: 10.1145/3626772.3657878.
    [23] MENG Xiangchao, XIONG Yiming, SHAO Feng, et al. A large-scale benchmark data set for evaluating pansharpening performance: Overview and implementation[J]. IEEE Geoscience and Remote Sensing Magazine, 2021, 9(1): 18–52. doi: 10.1109/MGRS.2020.2976696.
    [24] WALD L, RANCHIN T, and MANGOLINI M. Fusion of satellite images of different spatial resolutions: Assessing the quality of resulting images[J]. Photogrammetric Engineering and Remote Sensing, 1997, 63(6): 691–699.
    [25] GARZELLI A, NENCINI F, and CAPOBIANCO L. Optimal MMSE pan sharpening of very high resolution multispectral images[J]. IEEE Transactions on Geoscience and Remote Sensing, 2008, 46(1): 228–236. doi: 10.1109/TGRS.2007.907604.
    [26] CHOI J, YU Kiyun, and KIM Y. A new adaptive component-substitution-based satellite image fusion by using partial replacement[J]. IEEE Transactions on Geoscience and Remote Sensing, 2011, 49(1): 295–309. doi: 10.1109/TGRS.2010.2051674.
    [27] VIVONE G, ALPARONE L, CHANUSSOT J, et al. A critical comparison among pansharpening algorithms[J]. IEEE Transactions on Geoscience and Remote Sensing, 2015, 53(5): 2565–2586. doi: 10.1109/TGRS.2014.2361734.
    [28] ZHOU Jie, CIVCO D L, and SILANDER J A. A wavelet transform method to merge Landsat TM and SPOT panchromatic data[J]. International Journal of Remote Sensing, 1998, 19(4): 743–757. doi: 10.1080/014311698215973.
    [29] WALD L. Data Fusion. Definitions and Architectures - Fusion of Images of Different Spatial Resolutions[M]. Paris, France: Presses de l’École, Ecole des Mines de Paris, 2002: 165–189.
    [30] WANG Zhou and BOVIK A C. A universal image quality index[J]. IEEE Signal Processing Letters, 2002, 9(3): 81–84. doi: 10.1109/97.995823.
    [31] GARZELLI A and NENCINI F. Hypercomplex quality assessment of multi/hyperspectral images[J]. IEEE Geoscience and Remote Sensing Letters, 2009, 6(4): 662–665. doi: 10.1109/LGRS.2009.2022650.
    [32] ARIENZO A, VIVONE G, GARZELLI A, et al. Full-resolution quality assessment of pansharpening: Theoretical and hands-on approaches[J]. IEEE Geoscience and Remote Sensing Magazine, 2022, 10(3): 168–201. doi: 10.1109/MGRS.2022.3170092.
  • 加载中
图(6) / 表(5)
计量
  • 文章访问数:  28
  • HTML全文浏览量:  6
  • PDF下载量:  4
  • 被引次数: 0
出版历程
  • 收稿日期:  2025-11-26
  • 修回日期:  2026-04-17
  • 录用日期:  2026-04-17
  • 网络出版日期:  2026-05-03

目录

    /

    返回文章
    返回