高级搜索

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

原型对齐与拓扑一致性约束下的多模态半监督遥感图像语义分割

韩汶杞 蒋雯 耿杰 鲍衍琛

韩汶杞, 蒋雯, 耿杰, 鲍衍琛. 原型对齐与拓扑一致性约束下的多模态半监督遥感图像语义分割[J]. 电子与信息学报. doi: 10.11999/JEIT251115
引用本文: 韩汶杞, 蒋雯, 耿杰, 鲍衍琛. 原型对齐与拓扑一致性约束下的多模态半监督遥感图像语义分割[J]. 电子与信息学报. doi: 10.11999/JEIT251115
HAN Wenqi, JIANG Wen, GENG Jie, BAO Yanchen. PATC: Prototype Alignment and Topology-Consistent Pseudo-Supervision for Multimodal Semi-Supervised Semantic Segmentation of Remote Sensing Images[J]. Journal of Electronics & Information Technology. doi: 10.11999/JEIT251115
Citation: HAN Wenqi, JIANG Wen, GENG Jie, BAO Yanchen. PATC: Prototype Alignment and Topology-Consistent Pseudo-Supervision for Multimodal Semi-Supervised Semantic Segmentation of Remote Sensing Images[J]. Journal of Electronics & Information Technology. doi: 10.11999/JEIT251115

原型对齐与拓扑一致性约束下的多模态半监督遥感图像语义分割

doi: 10.11999/JEIT251115 cstr: 32379.14.JEIT251115
基金项目: 国家自然科学基金( 62571440)
详细信息
    作者简介:

    韩汶杞:男,讲师,研究方向为计算机视觉与多模态遥感图像语义分割

    蒋雯:女,教授,研究方向为人工智能与多模态遥感图像处理

    耿杰:男,副教授,研究方向为计算机视觉与多模态遥感图像处理

    鲍衍琛:男,硕士生,研究方向为多模态遥感图像智能处理

    通讯作者:

    蒋雯 jiangwen@nwpu.edu.cn

  • 中图分类号: TN911.73

PATC: Prototype Alignment and Topology-Consistent Pseudo-Supervision for Multimodal Semi-Supervised Semantic Segmentation of Remote Sensing Images

Funds: The National Natural Science Foundation of China(62571440)
  • 摘要: 在遥感图像语义分割任务中,模态异构性与标注成本高昂是制约模型性能提升的主要瓶颈。针对多模态遥感数据中标注样本有限的问题,该文提出一种原型对齐与拓扑一致性约束下的多模态半监督遥感图像语义分割方法。该方法以未标注的SAR图像为辅助信息,构建教师-学生框架,引入多模态类别原型对齐机制与拓扑一致性伪监督策略,以提升融合特征的判别性与结构稳定性。首先,构建光学与SAR模态的共享语义原型,并通过对比损失实现跨模态语义一致性学习;其次,设计基于持久同调理论的拓扑损失,从结构层面优化伪标签质量,有效缓解伪监督过程中的拓扑破坏问题。在公开数据集WHU-OPT-SAR数据集以及自建数据集Suzhou数据集(https://www.scidb.cn/detail?dataSetId=a55977a3a8d849a992cbb51e426370a8&version=V1&code=j00173)两个多模态遥感数据集上的实验结果表明,该方法在标注不足条件下仍具备优异的分割性能与良好的泛化能力。
  • 图  1  语义分割中拓扑结构中断与连通性示意图

    图  2  原型对齐与拓扑一致性约束下的多模态半监督遥感图像语义分割方法框架图

    图  3  不同阈值下拓扑结构演化与拓扑损失匹配机制示意图

    图  4  WHU-OPT-SAR多模态遥感数据集图像示例

    图  5  Suzhou多模态遥感数据集图像示例

    图  6  不同方法在Suzhou数据集上的可视化结果对比

    表  1  WHU-OPT-SAR 数据集的训练集与测试集图像数量

    标注
    数据比例
    训练集 测试集
    有标注 无标注 有标注
    1/4 1408 4224 1408
    1/8 704 4928 1408
    1/16 352 5280 1408
    下载: 导出CSV

    表  2  Suzhou数据集的训练集与测试集图像数量

    标注
    数据比例
    训练集 测试集
    有标注 无标注 有标注
    1/4 124 372 125
    1/8 62 434 125
    1/16 31 465 125
    下载: 导出CSV

    表  3  不同方法在不同标注比例下在WHU-OPT-SAR 数据集上的性能对比(%)

    方法1/161/81/4
    mIoUFWIoUOAmIoUFWIoUOAmIoUFWIoUOA
    MCANet38.3461.4075.6643.0764.2877.8448.2167.1179.89
    CMX38.9262.2776.3943.5565.4678.9251.2868.5180.93
    Dformer35.7260.7875.2940.2062.4276.544.9965.7478.93
    Sigma32.0758.8373.6637.7962.1476.2241.3664.0477.62
    PATC(w/o SAR)43.8565.3178.7448.367.5480.2952.6869.681.67
    ST++46.7466.7179.7252.2869.3981.5455.6471.2182.88
    MPRFN46.7666.4779.4548.7067.9680.5954.2970.2882.13
    PATC53.0869.6881.8754.9970.7782.5556.4672.0083.44
    下载: 导出CSV

    表  4  不同方法在不同标注比例下在Suzhou 数据集上的性能对比(%)

    方法1/161/81/4
    mIoUFWIoUOAmIoUFWIoUOAmIoUFWIoUOA
    MCANet49.5056.7471.5854.7964.6877.4456.7666.5978.91
    CMX52.1259.4673.5759.9967.9079.6261.9769.6581.01
    Dformer43.7451.8267.7551.1258.5572.9756.8663.7476.83
    Sigma40.7549.7465.4647.5255.7770.2051.4959.2273.05
    PATC(w/o SAR)58.1564.4377.2760.9969.1280.7164.2171.0682.07
    ST++56.7666.5978.9163.8170.7481.8365.2571.5882.48
    MPRFN58.9064.9877.7063.4770.8581.8264.5871.3582.21
    PATC60.3768.6980.3964.6071.2682.2267.6873.5483.82
    下载: 导出CSV

    表  5  各损失项与半监督策略对模型性能的影响分析(%)

    半监督学习 $ \mathcal{L}\mathrm{_p} $ $ \mathcal{L}\mathrm{_t} $ 水体 林地 农田 道路 建筑物 未利用土地 mIoU

    FWIoU

    OA

    89.56 29.12 83.34 38.09 74.49 51.35 60.99 69.12 80.71
    89.15 36.65 83.96 42.47 72.35 53.08 62.94 70.01 81.1
    89.03 38.17 84.24 41.36 73.71 52.45 63.16 70.24 81.5
    90.23 36.38 84.82 47.15 75.05 54.01 64.60 71.26 82.22
    下载: 导出CSV

    表  6  各模型复杂度与运算效率分析

    方法平均训练时间(s)参数总量(M)浮点运算次数(G)
    CMX201.149.6557.44
    Sigma205.860.6071.71
    MPRFN322.588.11101.07
    本文方法221.674.8279.15
    下载: 导出CSV
  • [1] 要旭东, 郭雅萍, 刘梦阳, 等. 遥感图像中不确定性驱动的像素级对抗噪声检测方法[J]. 电子与信息学报, 2025, 47(6): 1633–1644. doi: 10.11999/JEIT241157.

    YAO Xudong, GUO Yaping, LIU Mengyang, et al. An uncertainty-driven pixel-level adversarial noise detection method for remote sensing images[J]. Journal of Electronics & Information Technology, 2025, 47(6): 1633–1644. doi: 10.11999/JEIT241157.
    [2] 尚可, 晏磊, 张飞舟, 等. 从BRDF到BPDF: 遥感反演基础模型的演进初探[J]. 中国科学: 信息科学, 2024, 54(8): 2001–2020. doi: 10.1360/SSI-2023-0193.

    SHANG Ke, YAN Lei, ZHANG Feizhou, et al. From BRDF to BPDF: A premilinary study on evolution of the basic remote sensing quantitative inversion model[J]. Scientia Sinica Informationis, 2024, 54(8): 2001–2020. doi: 10.1360/SSI-2023-0193.
    [3] 刁文辉, 龚铄, 辛林霖, 等. 针对多模态遥感数据的自监督策略模型预训练方法[J]. 电子与信息学报, 2025, 47(6): 1658–1668. doi: 10.11999/JEIT241016.

    DIAO Wenhui, GONG Shuo, XIN Linlin, et al. A model pre-training method with self-supervised strategies for multimodal remote sensing data[J]. Journal of Electronics & Information Technology, 2025, 47(6): 1658–1668. doi: 10.11999/JEIT241016.
    [4] TIAN Jiaqi, ZHU Xiaolin, SHEN Miaogen, et al. Effectiveness of spatiotemporal data fusion in fine-scale land surface phenology monitoring: A simulation study[J]. Journal of Remote Sensing, 2024, 4: 0118. doi: 10.34133/remotesensing.0118.
    [5] LIU Shuaijun, LIU Jia, TAN Xiaoyue, et al. A hybrid spatiotemporal fusion method for high spatial resolution imagery: Fusion of Gaofen-1 and Sentinel-2 over agricultural landscapes[J]. Journal of Remote Sensing, 2024, 4: 0159. doi: 10.34133/remotesensing.0159.
    [6] SHI Qian, HE Da, LIU Zhengyu, et al. Globe230k: A benchmark dense-pixel annotation dataset for global land cover mapping[J]. Journal of Remote Sensing, 2023, 3: 0078. doi: 10.34133/remotesensing.0078.
    [7] LIN Junyan, CHEN Haoran, FAN Yue, et al. Multi-layer visual feature fusion in multimodal LLMs: Methods, analysis, and best practices[C]. The 2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, USA, 2025: 4156–4166. doi: 10.1109/CVPR52734.2025.00393.
    [8] MAO Shasha, LU Shiming, DU Zhaolong, et al. Cross-rejective open-set SAR image registration[C]. The 2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, USA, 2025: 23027–23036. doi: 10.1109/CVPR52734.2025.02144.
    [9] WANG Benquan, AN Ruyi, SO J K, et al. OpticalNet: An optical imaging dataset and benchmark beyond the diffraction limit[C]. The 2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, USA, 2025: 10900–10912. doi: 10.1109/CVPR52734.2025.01018.
    [10] 高尚华, 周攀, 程明明, 等. 迈向可持续自监督学习: 基于目标增强的条件掩码重建自监督学习[J]. 中国科学: 信息科学, 2025, 55(2): 326–342. doi: 10.1360/SSI-2024-0176.

    GAO Shanghua, ZHOU Pan, CHENG Mingming, et al. Towards sustainable self-supervised learning: Target-enhanced conditional mask-reconstruction for self-supervised learning[J]. Scientia Sinica Informationis, 2025, 55(2): 326–342. doi: 10.1360/SSI-2024-0176.
    [11] 毕秀丽, 徐培君, 范骏超, 等. 基于亲和向量一致性的弱监督语义分割[J]. 中国科学: 信息科学, 2025, 55(5): 1088–1107. doi: 10.1360/SSI-2024-0222.

    BI Xiuli, XU Peijun, FAN Junchao, et al. Weakly supervised semantic segmentation based on affinity vector consistency[J]. Scientia Sinica Informationis, 2025, 55(5): 1088–1107. doi: 10.1360/SSI-2024-0222.
    [12] HU Jie, CHEN Chen, CAO Liujuan, et al. Pseudo-label alignment for semi-supervised instance segmentation[C]. The 2023 IEEE/CVF International Conference on Computer Vision, Paris, France, 2023: 16291–16301. doi: 10.1109/ICCV51070.2023.01497.
    [13] CHENG Bowen, MISRA I, SCHWING A G, et al. Masked-attention mask transformer for universal image segmentation[C]. The 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, USA, 2022: 1280–1289. doi: 10.1109/CVPR52688.2022.00135.
    [14] MEI Shaohui, LIAN Jiawei, WANG Xiaofei, et al. A comprehensive study on the robustness of deep learning-based image classification and object detection in remote sensing: Surveying and benchmarking[J]. Journal of Remote Sensing, 2024, 4: 0219. doi: 10.34133/remotesensing.0219.
    [15] WANG Haoyu and LI Xiaofeng. Expanding horizons: U-Net enhancements for semantic segmentation, forecasting, and super-resolution in ocean remote sensing[J]. Journal of Remote Sensing, 2024, 4: 0196. doi: 10.34133/remotesensing.0196.
    [16] XU Zhiyong, ZHANG Weicun, ZHANG Tianxiang, et al. HRCNet: High-resolution context extraction network for semantic segmentation of remote sensing images[J]. Remote Sensing, 2021, 13(1): 71. doi: 10.3390/rs13010071.
    [17] LI Rui, ZHENG Shunyi, ZHANG Ce, et al. Multiattention network for semantic segmentation of fine-resolution remote sensing images[J]. IEEE Transactions on Geoscience and Remote Sensing, 2022, 60: 5607713. doi: 10.1109/TGRS.2021.3093977.
    [18] XIE Enze, WANG Wenhai, YU Zhiding, et al. SegFormer: Simple and efficient design for semantic segmentation with transformers[C]. The 35th International Conference on Neural Information Processing Systems, 2021: 924. doi: 10.5555/3540261.3541185.
    [19] GAO Feng, JIN Xuepeng, ZHOU Xiaowei, et al. MSFMamba: Multiscale feature fusion state space model for multisource remote sensing image classification[J]. IEEE Transactions on Geoscience and Remote Sensing, 2025, 63: 5504116. doi: 10.1109/TGRS.2025.3535622.
    [20] XU Xiaodong, LI Wei, RAN Qiong, et al. Multisource remote sensing data classification based on convolutional neural network[J]. IEEE Transactions on Geoscience and Remote Sensing, 2018, 56(2): 937–949. doi: 10.1109/TGRS.2017.2756851.
    [21] LI Xue, ZHANG Guo, CUI Hao, et al. MCANet: A joint semantic segmentation framework of optical and SAR images for land use classification[J]. International Journal of Applied Earth Observation and Geoinformation, 2022, 106: 102638. doi: 10.1016/j.jag.2021.102638.
    [22] ZHANG Jiaming, LIU Huayao, YANG Kailun, et al. CMX: Cross-modal fusion for RGB-X semantic segmentation with transformers[J]. IEEE Transactions on Intelligent Transportation Systems, 2023, 24(12): 14679–14694. doi: 10.1109/TITS.2023.3300537.
    [23] OUALI Y, HUDELOT C, and TAMI M. Semi-supervised semantic segmentation with cross-consistency training[C]. The 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, USA, 2020: 12671–12681. doi: 10.1109/CVPR42600.2020.01269.
    [24] LAI Xin, TIAN Zhuotao, JIANG Li, et al. Semi-supervised semantic segmentation with directional context-aware consistency[C]. The 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, USA, 2021: 1205–1214. doi: 10.1109/CVPR46437.2021.00126.
    [25] HAN Wenqi, GENG Jie, DENG Xinyang, et al. Enhancing multimodal fusion with only unimodal data[C]. IGARSS 2024 - 2024 IEEE International Geoscience and Remote Sensing Symposium, Athens, Greece, 2024: 2962–2965. doi: 10.1109/IGARSS53475.2024.10641451.
    [26] JIANG Pengtao, ZHANG Changbin, HOU Qibin, et al. LayerCAM: Exploring hierarchical class activation maps for localization[J]. IEEE Transactions on Image Processing, 2021, 30: 5875–5888. doi: 10.1109/TIP.2021.3089943.
    [27] ZOU Yuliang, ZHANG Zizhao, ZHANG Han, et al. PseudoSeg: Designing pseudo labels for semantic segmentation[C]. 9th International Conference on Learning Representations, 2021.
    [28] YANG Lihe, ZHUO Wei, QI Lei, et al. ST++: Make self-trainingWork better for semi-supervised semantic segmentation[C]. The 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, USA, 2022: 4258–4267. doi: 10.1109/CVPR52688.2022.00423.
    [29] ZOMORODIAN A and CARLSSON G. Computing persistent homology[C]. The 20th Annual Symposium on Computational Geometry, Brooklyn, USA, 2004: 347–356. doi: 10.1145/997817.997870.
    [30] HU Xiaoling, LI Fuxin, SAMARAS D, et al. Topology-preserving deep image segmentation[C]. The 33rd International Conference on Neural Information Processing Systems, Vancouver, Canada, 2019: 508. doi: 10.5555/3454287.3454795.
    [31] KLINKER F. Exponential moving average versus moving exponential average[J]. Mathematische Semesterberichte, 2011, 58(1): 97–107. doi: 10.1007/s00591-010-0080-8.
    [32] KINGMA D P and BA J. Adam: A method for stochastic optimization[C]. 3rd International Conference on Learning Representations, San Diego, USA, 2015.
    [33] YIN Bowen, ZHANG Xuying, LI Zhongyu, et al. DFormer: Rethinking RGBD representation learning for semantic segmentation[C]. The 12th International Conference on Learning Representations, Vienna, Austria, 2024.
    [34] WAN Zifu, ZHANG Pingping, WANG Yuhao, et al. Sigma: Siamese mamba network for multi-modal semantic segmentation[C]. 2025 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Tucson, USA, 2025: 1734–1744. doi: 10.1109/WACV61041.2025.00176.
  • 加载中
图(6) / 表(6)
计量
  • 文章访问数:  67
  • HTML全文浏览量:  31
  • PDF下载量:  7
  • 被引次数: 0
出版历程
  • 收稿日期:  2025-10-22
  • 修回日期:  2026-01-11
  • 网络出版日期:  2026-01-13

目录

    /

    返回文章
    返回