PATC: Prototype Alignment and Topology-Consistent Pseudo-Supervision for Multimodal Semi-Supervised Semantic Segmentation of Remote Sensing Images

HAN Wenqi; JIANG Wen; GENG Jie; BAO Yanchen

doi:10.11999/JEIT251115

Volume 47 Issue 12

Dec. 2026

Turn off MathJax

Article Contents

Article Navigation > Journal of Electronics & Information Technology > 2025 > 47(12): 4714-4727

HAN Wenqi, JIANG Wen, GENG Jie, BAO Yanchen. PATC: Prototype Alignment and Topology-Consistent Pseudo-Supervision for Multimodal Semi-Supervised Semantic Segmentation of Remote Sensing Images[J]. Journal of Electronics & Information Technology, 2025, 47(12): 4714-4727. doi: 10.11999/JEIT251115

Citation:

HAN Wenqi, JIANG Wen, GENG Jie, BAO Yanchen. PATC: Prototype Alignment and Topology-Consistent Pseudo-Supervision for Multimodal Semi-Supervised Semantic Segmentation of Remote Sensing Images[J]. Journal of Electronics & Information Technology, 2025, 47(12): 4714-4727. doi: 10.11999/JEIT251115

Citation:

HAN Wenqi, JIANG Wen, GENG Jie, BAO Yanchen. PATC: Prototype Alignment and Topology-Consistent Pseudo-Supervision for Multimodal Semi-Supervised Semantic Segmentation of Remote Sensing Images[J]. Journal of Electronics & Information Technology, 2025, 47(12): 4714-4727. doi: 10.11999/JEIT251115

PDF( 7086 KB)

PATC: Prototype Alignment and Topology-Consistent Pseudo-Supervision for Multimodal Semi-Supervised Semantic Segmentation of Remote Sensing Images

doi: 10.11999/JEIT251115 cstr: 32379.14.JEIT251115

HAN Wenqi^{1, 2},
JIANG Wen^{2
,
,},
GENG Jie²,
BAO Yanchen²

1.
China University of Petroleum(East China) Qingdao Software College, School of Computer Science and Technology, Qingdao 266580, China
2.
Northwestern Polytechnical University, Xi’an 710072, China

Funds: The National Natural Science Foundation of China(62571440)

Received Date: 2025-10-22
Rev Recd Date: 2026-01-11

Available Online: 2026-01-13

Publish Date: 2025-12-10

Abstract

Abstract

Objective The high annotation cost of remote sensing data and the heterogeneity between optical and SAR modalities limit the performance and scalability of semantic segmentation systems. This study examines a practical semi-supervised setting where only a small set of paired optical–SAR samples is labeled, whereas numerous single-modality SAR images remain unlabeled. The objective is to design a semi-supervised multimodal framework capable of learning discriminative and topology-consistent fused representations under sparse labels by aligning cross-modal semantics and preserving structural coherence through pseudo-supervision. The proposed Prototype Alignment and Topology Consistent (PATC) method aims to achieve robust land-cover segmentation on challenging multimodal datasets, improving region-level accuracy and connectivity-aware structure quality. Methods PATC adopts a teacher–student framework that exploits limited labeled optical–SAR pairs and abundant unlabeled SAR data. A shared semantic prototype space is first constructed to reduce modality gaps, where class prototypes are updated with a momentum mechanism for stability. A prototype-level contrastive alignment strategy enhances intra-class compactness and inter-class separability, guiding optical and SAR features of the same category to cluster around unified prototypes and improving cross-modal semantic consistency. To preserve structural integrity, a topology-consistent pseudo-supervision mechanism is incorporated. Inspired by persistent homology, a topology-aware loss constrains the teacher-generated pseudo-labels by penalizing errors such as incorrect formation or removal of connected components and holes. This structural constraint complements pixel-wise losses by maintaining boundary continuity and fine structures (e.g., roads and rivers), ensuring that pseudo-supervised learning remains geometrically and topologically coherent. Results and Discussions Experiments show that PATC reduces cross-modal semantic misalignment and topology degradation. By regularizing pseudo-labels with a topology-consistent loss derived from persistent homology, the method preserves connectivity and boundary integrity, especially for thin or fragmented structures. Evaluations on the WHU-OPT-SAR and Suzhou datasets demonstrate consistent improvements over state-of-the-art fully supervised and semi-supervised baselines under 1/16, 1/8, and 1/4 label regimes (Fig. 4, Fig. 5, Fig. 6; Table 3, Table 4). Ablation studies confirm the complementary roles of prototype alignment and topology regularization (Table 5). The findings indicate that unlabeled SAR data provides structural priors that, when used through topology-aware consistency and prototype-level alignment, substantially enhance multimodal fusion under sparse annotation. Conclusions This study proposes PATC, a multimodal semi-supervised semantic segmentation method that addresses limited annotations, modality misalignment, and weak generalization. PATC constructs multimodal semantic prototypes in a shared feature subspace and applies prototype-level contrastive learning to improve cross-modal consistency and feature discriminability. A topology-consistent loss based on persistent homology further regularizes the student network, improving the connectivity and structural stability of segmentation results. By incorporating structural priors from unlabeled SAR data within a teacher-student framework with EMA updates, PATC achieves robust multimodal feature fusion and accurate segmentation under scarce labels. Future work will expand topology-based pseudo-supervision to broader multimodal configurations and integrate active learning to refine pseudo-label quality.
- Semi-supervised learning,
- Semantic segmentation,
- Multimodal fusion,
- Multimodal remote sensing

FullText(HTML)

References(34)

References

[1]	要旭东, 郭雅萍, 刘梦阳, 等. 遥感图像中不确定性驱动的像素级对抗噪声检测方法[J]. 电子与信息学报, 2025, 47(6): 1633–1644. doi: 10.11999/JEIT241157. YAO Xudong, GUO Yaping, LIU Mengyang, et al. An uncertainty-driven pixel-level adversarial noise detection method for remote sensing images[J]. Journal of Electronics & Information Technology, 2025, 47(6): 1633–1644. doi: 10.11999/JEIT241157.
[2]	尚可, 晏磊, 张飞舟, 等. 从BRDF到BPDF: 遥感反演基础模型的演进初探[J]. 中国科学: 信息科学, 2024, 54(8): 2001–2020. doi: 10.1360/SSI-2023-0193. SHANG Ke, YAN Lei, ZHANG Feizhou, et al. From BRDF to BPDF: A premilinary study on evolution of the basic remote sensing quantitative inversion model[J]. Scientia Sinica Informationis, 2024, 54(8): 2001–2020. doi: 10.1360/SSI-2023-0193.
[3]	刁文辉, 龚铄, 辛林霖, 等. 针对多模态遥感数据的自监督策略模型预训练方法[J]. 电子与信息学报, 2025, 47(6): 1658–1668. doi: 10.11999/JEIT241016. DIAO Wenhui, GONG Shuo, XIN Linlin, et al. A model pre-training method with self-supervised strategies for multimodal remote sensing data[J]. Journal of Electronics & Information Technology, 2025, 47(6): 1658–1668. doi: 10.11999/JEIT241016.
[4]	TIAN Jiaqi, ZHU Xiaolin, SHEN Miaogen, et al. Effectiveness of spatiotemporal data fusion in fine-scale land surface phenology monitoring: A simulation study[J]. Journal of Remote Sensing, 2024, 4: 0118. doi: 10.34133/remotesensing.0118.
[5]	LIU Shuaijun, LIU Jia, TAN Xiaoyue, et al. A hybrid spatiotemporal fusion method for high spatial resolution imagery: Fusion of Gaofen-1 and Sentinel-2 over agricultural landscapes[J]. Journal of Remote Sensing, 2024, 4: 0159. doi: 10.34133/remotesensing.0159.
[6]	SHI Qian, HE Da, LIU Zhengyu, et al. Globe230k: A benchmark dense-pixel annotation dataset for global land cover mapping[J]. Journal of Remote Sensing, 2023, 3: 0078. doi: 10.34133/remotesensing.0078.
[7]	LIN Junyan, CHEN Haoran, FAN Yue, et al. Multi-layer visual feature fusion in multimodal LLMs: Methods, analysis, and best practices[C]. The 2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, USA, 2025: 4156–4166. doi: 10.1109/CVPR52734.2025.00393.
[8]	MAO Shasha, LU Shiming, DU Zhaolong, et al. Cross-rejective open-set SAR image registration[C]. The 2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, USA, 2025: 23027–23036. doi: 10.1109/CVPR52734.2025.02144.
[9]	WANG Benquan, AN Ruyi, SO J K, et al. OpticalNet: An optical imaging dataset and benchmark beyond the diffraction limit[C]. The 2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, USA, 2025: 10900–10912. doi: 10.1109/CVPR52734.2025.01018.
[10]	高尚华, 周攀, 程明明, 等. 迈向可持续自监督学习: 基于目标增强的条件掩码重建自监督学习[J]. 中国科学: 信息科学, 2025, 55(2): 326–342. doi: 10.1360/SSI-2024-0176. GAO Shanghua, ZHOU Pan, CHENG Mingming, et al. Towards sustainable self-supervised learning: Target-enhanced conditional mask-reconstruction for self-supervised learning[J]. Scientia Sinica Informationis, 2025, 55(2): 326–342. doi: 10.1360/SSI-2024-0176.
[11]	毕秀丽, 徐培君, 范骏超, 等. 基于亲和向量一致性的弱监督语义分割[J]. 中国科学: 信息科学, 2025, 55(5): 1088–1107. doi: 10.1360/SSI-2024-0222. BI Xiuli, XU Peijun, FAN Junchao, et al. Weakly supervised semantic segmentation based on affinity vector consistency[J]. Scientia Sinica Informationis, 2025, 55(5): 1088–1107. doi: 10.1360/SSI-2024-0222.
[12]	HU Jie, CHEN Chen, CAO Liujuan, et al. Pseudo-label alignment for semi-supervised instance segmentation[C]. The 2023 IEEE/CVF International Conference on Computer Vision, Paris, France, 2023: 16291–16301. doi: 10.1109/ICCV51070.2023.01497.
[13]	CHENG Bowen, MISRA I, SCHWING A G, et al. Masked-attention mask transformer for universal image segmentation[C]. The 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, USA, 2022: 1280–1289. doi: 10.1109/CVPR52688.2022.00135.
[14]	MEI Shaohui, LIAN Jiawei, WANG Xiaofei, et al. A comprehensive study on the robustness of deep learning-based image classification and object detection in remote sensing: Surveying and benchmarking[J]. Journal of Remote Sensing, 2024, 4: 0219. doi: 10.34133/remotesensing.0219.
[15]	WANG Haoyu and LI Xiaofeng. Expanding horizons: U-Net enhancements for semantic segmentation, forecasting, and super-resolution in ocean remote sensing[J]. Journal of Remote Sensing, 2024, 4: 0196. doi: 10.34133/remotesensing.0196.
[16]	XU Zhiyong, ZHANG Weicun, ZHANG Tianxiang, et al. HRCNet: High-resolution context extraction network for semantic segmentation of remote sensing images[J]. Remote Sensing, 2021, 13(1): 71. doi: 10.3390/rs13010071.
[17]	LI Rui, ZHENG Shunyi, ZHANG Ce, et al. Multiattention network for semantic segmentation of fine-resolution remote sensing images[J]. IEEE Transactions on Geoscience and Remote Sensing, 2022, 60: 5607713. doi: 10.1109/TGRS.2021.3093977.
[18]	XIE Enze, WANG Wenhai, YU Zhiding, et al. SegFormer: Simple and efficient design for semantic segmentation with transformers[C]. The 35th International Conference on Neural Information Processing Systems, 2021: 924. doi: 10.5555/3540261.3541185.
[19]	GAO Feng, JIN Xuepeng, ZHOU Xiaowei, et al. MSFMamba: Multiscale feature fusion state space model for multisource remote sensing image classification[J]. IEEE Transactions on Geoscience and Remote Sensing, 2025, 63: 5504116. doi: 10.1109/TGRS.2025.3535622.
[20]	XU Xiaodong, LI Wei, RAN Qiong, et al. Multisource remote sensing data classification based on convolutional neural network[J]. IEEE Transactions on Geoscience and Remote Sensing, 2018, 56(2): 937–949. doi: 10.1109/TGRS.2017.2756851.
[21]	LI Xue, ZHANG Guo, CUI Hao, et al. MCANet: A joint semantic segmentation framework of optical and SAR images for land use classification[J]. International Journal of Applied Earth Observation and Geoinformation, 2022, 106: 102638. doi: 10.1016/j.jag.2021.102638.
[22]	ZHANG Jiaming, LIU Huayao, YANG Kailun, et al. CMX: Cross-modal fusion for RGB-X semantic segmentation with transformers[J]. IEEE Transactions on Intelligent Transportation Systems, 2023, 24(12): 14679–14694. doi: 10.1109/TITS.2023.3300537.
[23]	OUALI Y, HUDELOT C, and TAMI M. Semi-supervised semantic segmentation with cross-consistency training[C]. The 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, USA, 2020: 12671–12681. doi: 10.1109/CVPR42600.2020.01269.
[24]	LAI Xin, TIAN Zhuotao, JIANG Li, et al. Semi-supervised semantic segmentation with directional context-aware consistency[C]. The 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, USA, 2021: 1205–1214. doi: 10.1109/CVPR46437.2021.00126.
[25]	HAN Wenqi, GENG Jie, DENG Xinyang, et al. Enhancing multimodal fusion with only unimodal data[C]. IGARSS 2024 - 2024 IEEE International Geoscience and Remote Sensing Symposium, Athens, Greece, 2024: 2962–2965. doi: 10.1109/IGARSS53475.2024.10641451.
[26]	JIANG Pengtao, ZHANG Changbin, HOU Qibin, et al. LayerCAM: Exploring hierarchical class activation maps for localization[J]. IEEE Transactions on Image Processing, 2021, 30: 5875–5888. doi: 10.1109/TIP.2021.3089943.
[27]	ZOU Yuliang, ZHANG Zizhao, ZHANG Han, et al. PseudoSeg: Designing pseudo labels for semantic segmentation[C]. 9th International Conference on Learning Representations, 2021.
[28]	YANG Lihe, ZHUO Wei, QI Lei, et al. ST++: Make self-trainingWork better for semi-supervised semantic segmentation[C]. The 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, USA, 2022: 4258–4267. doi: 10.1109/CVPR52688.2022.00423.
[29]	ZOMORODIAN A and CARLSSON G. Computing persistent homology[C]. The 20th Annual Symposium on Computational Geometry, Brooklyn, USA, 2004: 347–356. doi: 10.1145/997817.997870.
[30]	HU Xiaoling, LI Fuxin, SAMARAS D, et al. Topology-preserving deep image segmentation[C]. The 33rd International Conference on Neural Information Processing Systems, Vancouver, Canada, 2019: 508. doi: 10.5555/3454287.3454795.
[31]	KLINKER F. Exponential moving average versus moving exponential average[J]. Mathematische Semesterberichte, 2011, 58(1): 97–107. doi: 10.1007/s00591-010-0080-8.
[32]	KINGMA D P and BA J. Adam: A method for stochastic optimization[C]. 3rd International Conference on Learning Representations, San Diego, USA, 2015.
[33]	YIN Bowen, ZHANG Xuying, LI Zhongyu, et al. DFormer: Rethinking RGBD representation learning for semantic segmentation[C]. The 12th International Conference on Learning Representations, Vienna, Austria, 2024.
[34]	WAN Zifu, ZHANG Pingping, WANG Yuhao, et al. Sigma: Siamese mamba network for multi-modal semantic segmentation[C]. 2025 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Tucson, USA, 2025: 1734–1744. doi: 10.1109/WACV61041.2025.00176.