Spatial Information-Guided Diffusion for Remote Sensing Image Domain Adaptation Semantic Segmentation
-
摘要: 为提高遥感图像域自适应语义分割Domain Adaptation Semantic Segmentation(DASS)的跨域适应效果,本文提出基于协同训练与空间引导扩散模型的域自适应语义分割框架Co-Training Spatial-Guided DASS(CoSG-DASS)。CoSG-DASS使用空间引导扩散模型构成图像翻译减少域间差异,并利用协同训练提升模型对目标域的自适应能力:在图像翻译阶段,设计新型空间信息引导扩散模型,通过向潜在扩散模型注入水平语义分布伪标签与垂直语义分布深度估计重构空间引导信息,实现源域到目标域的语义无偏转换。其中针对伪标签质量问题,提出基于熵的引导强度自适应模块,通过熵值筛选高置信度区域特征以抑制噪声干扰,有效提升跨域成像差异下的语义对齐精度;在协同训练阶段,提出融合深度信息与对抗损失的训练策略,通过增强多维知识表征缩减类内差异并增大类间差异,提升模型的跨域自适应能力。通过在三类典型遥感跨域差异任务(跨地理环境、跨成像模式、标签语义异质)中仿真验证,本文所提CoSG-DASS表现优异,相较于已有方法在平均交并比(mIoU)分别有1.14%、3.78%和2.49%的提升。Abstract:
Objective Domain Adaptive Semantic Segmentation (DASS) is critical for remote sensing applications including land cover mapping, urban planning and environmental monitoring. However, deep learning models suffer severe performance degradation due to domain shifts caused by imaging variations, geographic differences and label semantic heterogeneity. Traditional alignment and GAN-based methods often fail to preserve semantic consistency and are sensitive to noisy supervision, especially under large cross-domain gaps. This work aims to build a robust DASS framework for semantically consistent image translation and effective knowledge transfer. Methods We propose Co-training Spatial-Guided DASS (CoSG-DASS), a two-stage framework integrating image translation and collaborative training. A spatial-guided latent diffusion model enhanced with ControlNet is designed, using pseudo-labels and depth estimates as spatial conditions. To mitigate noisy pseudo-labels, we introduce an Entropy-based Adaptive Guidance Intensity Module (EAGIM) that evaluates pixel confidence via information entropy and suppresses unreliable features. In collaborative training, translated target-style images and unlabeled real target images jointly train a segmentation model with a depth-aware decoder, using a hybrid loss of cross-entropy and adversarial loss. Results and Discussions Extensive experiments verify CoSG-DASS on three cross-domain tasks. It generates images closer to target-domain distributions. Quantitative results (FID, IS) demonstrate its superiority over CycleGAN, UNI-Diff, and CRS-Diff ( Table 1 ). Visual comparisons (Fig.6 ) demonstrate that our method reduces edge blur and category confusion, such as better distinguishing roads from vegetation and preserving small objects like vehicles. In the semantic segmentation stage, CoSG-DASS significantly outperforms state-of-the-art domain adaptation methods. It achieves mIoU improvements of 1.14%, 3.78%, and 2.49% on the cross-geography (Vaihingen IRRG→Potsdam IRRG), cross-modality (Vaihingen IRRG→Potsdam RGB),and label heterogeneity (DFC25↔LoveDA) tasks, respectively (Table 2 ,Table 3 ,Table 4 ). Segmentation result visualizations (Fig.7 ) confirm its strong boundary preservation and accuracy in complex scenes. Ablation studies (Table 5 ) verify the contribution of each core component (depth control, pseudo-label guidance, EAGIM, and co-training strategy). Feature distribution visualization using UMAP (Fig.8 ) further illustrates that CoSG-DASS effectively compacts intra-class features and separates inter-class features after adaptation.Conclusions CoSG-DASS effectively alleviates domain shifts in remote sensing images via semantic-preserving diffusion translation and depth-aware co-training. It outperforms existing methods on both translation quality and segmentation accuracy, providing a valuable solution for multi-source remote sensing interpretation. Future work will target extreme label heterogeneity and lightweight diffusion designs. -
表 1 图像翻译质量结果量化对比
表 2 跨地理区域场景Vaihingen IRRG(源域)至Potsdam IRRG(目标域)语义分割对比(*%)
任务 自适应方法 不透水层 建筑物 低矮植被 树木 汽车 其它 mIoU Vaihingen IRRG
↓
Potsdam IRRG未适应 57.82 44.82 22.18 28.61 69.60 23.31 44.61 仅源域 86.49 90.71 73.70 73.62 74.20 72.01 79.74 CycleGAN[11] 58.01 67.46 38.61 39.19 57.02 22.06 52.06 DiscoGAN[20] 59.08 66.61 44.94 34.78 54.23 16.35 51.93 CyCADA[12] 12.55 29.24 47.38 51.27 3.53 9.83 28.79 CLAN[21] 66.29 72.66 42.72 45.48 67.46 21.44 58.92 FADA[22] 65.73 73.96 43.36 46.72 59.03 19.03 57.76 CorDA[10] 67.39 69.43 54.83 48.67 68.52 9.30 61.77 AdaptSegNet[23] 66.33 70.80 52.03 44.61 70.10 21.80 60.77 DAFormer[24] 68.45 76.18 53.67 49.52 68.93 20.12 62.85 DIFF[25] 67.82 73.54 52.51 49.03 68.21 17.84 62.10 UNI-Diff[14] 69.81 78.73 51.95 50.28 67.84 23.57 63.72 CRS-Diff[15] 67.81 80.45 58.40 50.74 69.44 18.56 65.37 OUR(COSG-DASS) 70.27 79.78 54.99 55.70 71.81 21.71 66.51 表 3 跨成像模式场景Vaihingen IRRG(源域)至Potsdam RGB(目标域)语义分割对比(*%)
任务 自适应方法 不透水层 建筑物 低矮植被 树木 汽车 其它 mIoU Vaihingen IRRG
↓
Potsdam RGB未适应 50.36 43.60 1.81 2.35 69.64 11.89 33.55 仅源域 86.49 90.71 73.70 73.62 74.20 72.01 79.74 CycleGAN[11] 53.80 64.03 32.35 33.05 50.78 17.68 46.80 DiscoGAN[20] 65.08 69.93 49.13 41.16 59.49 18.89 56.92 CyCADA[12] 35.29 42.24 26.51 24.70 6.86 7.82 27.12 CLAN[21] 62.98 71.91 47.51 47.77 66.19 10.91 59.27 FADA[22] 63.99 73.56 37.11 39.57 59.01 15.27 54.65 CorDA[10] 65.93 72.52 51.80 46.47 64.63 15.78 60.27 AdaptSegNet[23] 58.65 65.31 37.95 39.34 69.70 14.34 54.19 DAFormer[24] 63.25 71.38 48.76 46.82 65.74 14.85 58.80 DIFF[25] 64.87 74.06 50.24 47.15 66.82 15.93 59.85 UNI-Diff[14] 60.27 72.63 42.51 45.64 68.67 11.47 57.94 CRS-Diff[15] 59.63 76.01 44.37 43.38 69.41 13.22 58.56 OUR(COSG-DASS) 66.80 79.80 50.20 44.78 70.15 6.56 62.34 表 4 语义异质性场景(DFC25与LoveDA数据集互相迁移)语义分割对比(*%)
任务 自适应方法 背景 建筑 道路 水体 裸地/开发区区 森林/树木木 耕地 mIoU DFC25
↓
LoveDA未适应 24.83 53.47 41.57 64.90 1.13 32.88 18.30 36.46 仅源域 76.15 81.78 68.86 88.50 71.74 83.60 72.91 78.44 CycleGAN[11] 23.19 53.31 43.69 65.50 3.79 32.53 21.85 37.00 DiscoGAN[20] 23.45 50.34 42.72 56.92 1.35 28.57 12.43 33.89↓ CyCADA[12] 26.30 46.72 41.11 52.93 0.78 32.78 14.04 33.43↓ CLAN[21] 26.99 52.08 39.79 68.76 0.77 31.85 15.61 36.71 FADA[22] 24.85 52.41 42.07 61.32 1.14 33.04 18.22 35.81↓ CorDA[10] 37.50 38.77 23.25 63.56 3.32 64.07 7.69 38.41 AdaptSegNet[23] 25.69 52.80 46.38 57.42 1.63 36.33 19.47 36.71 DAFormer[24] 24.97 53.11 48.25 70.12 2.18 34.86 20.35 38.25 DIFF[25] 23.86 52.94 47.53 69.87 1.95 34.22 19.87 37.75 UNI-Diff[14] 21.44 53.07 45.70 70.69 2.36 34.75 20.68 38.00 CRS-Diff[15] 20.86 52.29 46.61 73.46 2.54 35.72 18.23 38.58 OUR(COSG-DASS) 25.82 53.24 49.67 71.64 1.27 33.94 20.46 39.26 LoveDA
↓
DFC25未适应 22.28 48.03 37.42 45.85 9.13 47.57 3.43 35.05 仅源域 40.42 63.82 61.46 71.09 33.08 43.99 43.17 52.31 CycleGAN[11] 22.07 53.48 29.43 58.74 4.95 29.17 8.69 32.97↓ DiscoGAN[20] 24.10 48.54 30.85 55.80 6.20 52.43 5.16 36.32 CyCADA[12] 21.60 50.31 32.12 39.50 4.05 49.37 6.99 32.82↓ CLAN[21] 22.24 49.10 33.25 41.58 6.26 46.73 11.66 33.19↓ FADA[22] 28.23 48.40 32.32 53.69 4.61 45.78 10.64 35.50 CorDA[10] 24.07 39.07 31.09 49.78 7.74 22.77 26.23 29.09↓ AdaptSegNet[23] 21.59 47.48 29.72 51.83 11.05 54.99 6.21 36.11 DAFormer[24] 30.08 49.82 33.56 65.43 5.17 59.04 11.3 39.2 DIFF[25] 29.54 48.76 32.18 64.05 3.82 58.47 10.68 38.5 UNI-Diff[14] 29.22 48.27 31.03 63.32 2.63 58.32 3.22 38.80 CRS-Diff[15] 30.63 50.65 31.01 60.90 5.04 59.62 10.57 39.64 OUR(COSG-DASS) 31.55 50.19 35.47 70.07 6.20 59.32 14.40 42.13 表 5 全局模块消融实验
任务 基线 深度控制 伪标签 EAGIM 协同
训练mIoU
(%)增量(%) Vaihingen
IRRG
↓
Potsdam
RGB√ 33.55 ---- √ √ 55.64 22.09 √ √ 50.16 16.61 √ √ √ √ 58.23 24.68 √ √ 35.76 0.21 √ √ √ √ √ 62.34 28.8 DFC25
↓
LoveDA√ 36.46 ---- √ √ 37.73 1.27 √ √ 37.19 0.73 √ √ √ √ 38.76 2.30 √ √ 37.36 0.90 √ √ √ √ √ 39.26 2.80 表 6 EAGIM系数对模型精度的影响(mIoU*%)
EAGIM系数 跨地理区域 跨成像模式 标签语义异质 0.3 64.86 60.12 37.06 0.5 66.51 62.08 42.13 0.7 65.72 60.65 41.95 0.9 65.13 59.43 39.63 表 7 $ {\lambda }_{\text{adv}} $参数对模型精度影响实验
$ {\lambda }_{\text{adv}} $ 0.05 0.10 0.15 0.20 0.50 V IR→P R 62.20 62.34 62.33 61.31 61.74 V IR→P IR 66.34 66.51 65.56 64.17 63.12 D→L 39.65 39.26 39.11 39.07 39.12 L→D 41.15 42.13 42.14 40.42 39.07 -
[1] 宋淼, 陈志强, 王培松, 等. DetDiffRS: 面向细节优化的遥感图像超分辨率扩散模型[J]. 电子与信息学报, 2025, 47(12): 4763–4778. doi: 10.11999/JEIT250995.SONG Miao, CHEN Zhiqiang, WANG Peisong, et al. DetDiffRS: A detail-enhanced diffusion model for remote sensing image super-resolution[J]. Journal of Electronics & Information Technology, 2025, 47(12): 4763–4778. doi: 10.11999/JEIT250995. [2] 刁文辉, 龚铄, 辛林霖, 等. 针对多模态遥感数据的自监督策略模型预训练方法[J]. 电子与信息学报, 2025, 47(6): 1658–1668. doi: 10.11999/JEIT241016.DIAO Wenhui, GONG Shuo, XIN Linlin, et al. A model pre-training method with self-supervised strategies for multimodal remote sensing data[J]. Journal of Electronics & Information Technology, 2025, 47(6): 1658–1668. doi: 10.11999/JEIT241016. [3] 余翔, 庞志濠. 融合FEB的YOLOX遥感图像目标检测算法[J]. 重庆邮电大学学报: 自然科学版, 2024, 36(2): 319–327. doi: 10.3979/j.issn.1673-825X.202302120032.YU Xiang and PANG Zhihao. YOLOX remote sensing image object detection algorithm based on FEB[J]. Journal of Chongqing University of Posts and Telecommunications: Natural Science Edition, 2024, 36(2): 319–327. doi: 10.3979/j.issn.1673-825X.202302120032. [4] 厉行, 樊养余, 郭哲, 等. 基于边缘领域自适应的立体匹配算法[J]. 电子与信息学报, 2024, 46(7): 2970–2980. doi: 10.11999/JEIT231113.LI Xing, FAN Yangyu, GUO Zhe, et al. Edge domain adaptation for stereo matching[J]. Journal of Electronics & Information Technology, 2024, 46(7): 2970–2980. doi: 10.11999/JEIT231113. [5] TEE Y Y, HONG Xuenong, CHENG Deruo, et al. Unsupervised domain adaptation with pseudo shape supervision for IC image segmentation[C]. 2024 IEEE International Symposium on the Physical and Failure Analysis of Integrated Circuits (IPFA), Singapore, Singapore, 2024: 1–6. doi: 10.1109/IPFA61654.2024.10690992. [6] HOFFMAN J, WANG Dequan, YU F, et al. FCNs in the wild: Pixel-level adversarial and constraint-based adaptation[EB/OL]. https://arxiv.org/abs/1612.02649, 2016. [7] VU T H, JAIN H, BUCHER M, et al. ADVENT: Adversarial entropy minimization for domain adaptation in semantic segmentation[C]. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, USA, 2019: 2512–2521. doi: 10.1109/CVPR.2019.00262. [8] ZOU Yang, YU Zhiding, VIJAYA KUMAR B V K, et al. Unsupervised domain adaptation for semantic segmentation via class-balanced self-training[C]. 15th European Conference on Computer Vision, Munich, Germany, 2018: 297–313. doi: 10.1007/978-3-030-01219-9_18. [9] VU T H, JAIN H, BUCHER M, et al. DADA: Depth-aware domain adaptation in semantic segmentation[C]. 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, South Korea, 2019: 7363–7372. doi: 10.1109/ICCV.2019.00746. [10] WANG Qin, DAI Dengxin, HOYER L, et al. Domain adaptive semantic segmentation with self-supervised depth estimation[C]. 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, Canada, 2021: 8495–8505. doi: 10.1109/ICCV48922.2021.00840. [11] ZHU Junyan, PARK T, ISOLA P, et al. Unpaired image-to-image translation using cycle-consistent adversarial networks[C]. 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 2017: 2242–2251. doi: 10.1109/ICCV.2017.244. [12] HOFFMAN J, TZENG E, PARK T, et al. CyCADA: Cycle-consistent adversarial domain adaptation[C]. 35th International Conference on Machine Learning, Stockholm, Sweden, 2018: 1989–1998. [13] ZHANG Lvmin, RAO Anyi, and AGRAWALA M. Adding conditional control to text-to-image diffusion models[C]. 2023 IEEE/CVF International Conference on Computer Vision (ICCV), Paris, France, 2023: 3813–3824. doi: 10.1109/ICCV51070.2023.00355. [14] DONG Xiao, HUANG Runhui, WEI Xiaoyong, et al. UniDiff: Advancing vision-language models with generative and discriminative learning[EB/OL]. https://arxiv.org/abs/2306.00813, 2023. [15] TANG Datao, CAO Xiangyong, HOU Xingsong, et al. CRS-Diff: Controllable remote sensing image generation with diffusion model[J]. IEEE Transactions on Geoscience and Remote Sensing, 2024, 62: 5638714. doi: 10.1109/TGRS.2024.3453414. [16] KINGMA D P and WELLING M. Auto-encoding variational Bayes[EB/OL]. https://arxiv.org/abs/1312.6114v11, 2022. [17] RADFORD A, KIM J W, HALLACY C, et al. Learning transferable visual models from natural language supervision[C]. 38th International Conference on Machine Learning, 2021: 8748–8763. (查阅网上资料, 未找到本条文献出版地信息, 请确认). [18] PARK T, LIU Mingyu, WANG Tingchun, et al. Semantic image synthesis with spatially-adaptive normalization[C]. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, USA, 2019: 2332–2341. doi: 10.1109/CVPR.2019.00244. [19] 梁燕, 易春霞, 王光宇, 等. 基于多尺度语义编解码网络的遥感图像语义分割[J]. 电子学报, 2023, 51(11): 3199–3214. doi: 10.12263/DZXB.20220503.LIANG Yan, YI Chunxia, WANG Guangyu, et al. Semantic segmentation of remote sensing image based on multi-scale semantic encoder-decoder network[J]. Acta Electronica Sinica, 2023, 51(11): 3199–3214. doi: 10.12263/DZXB.20220503. [20] ZHANG Xiaoke, HU Zongsheng, ZHANG Guoliang, et al. Dose calculation in proton therapy using a discovery cross-domain generative adversarial network (DiscoGAN)[J]. Medical Physics, 2021, 48(5): 2646–2660. doi: 10.1002/mp.14781. [21] LUO Yawei, ZHENG Liang, GUAN Tao, et al. Taking a closer look at domain shift: Category-level adversaries for semantics consistent domain adaptation[C]. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, USA, 2019: 2502–2511. doi: 10.1109/CVPR.2019.00261. [22] XU Tao, SUN Xian, DIAO Wenhui, et al. FADA: Feature aligned domain adaptive object detection in remote sensing imagery[J]. IEEE Transactions on Geoscience and Remote Sensing, 2022, 60: 5617916. doi: 10.1109/TGRS.2022.3147224. [23] TSAI Y H, HUNG W C, SCHULTER S, et al. Learning to adapt structured output space for semantic segmentation[C]. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, USA, 2018: 7472–7481. doi: 10.1109/CVPR.2018.00780. [24] HOYER L, DAI Dengxin, and VAN GOOL L. DAFormer: Improving network architectures and training strategies for domain-adaptive semantic segmentation[C]. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, USA, 2022: 9914–9925. doi: 10.1109/CVPR52688.2022.00969. [25] JI Yuxiang, HE Boyong, QU Chenyuan, et al. Diffusion features to bridge domain gap for semantic segmentation[C]. 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Hyderabad, India, 2025: 1–5. doi: 10.1109/ICASSP49660.2025.10888537. [26] WANG Libo, LI Rui, ZHANG Ce, et al. UNetFormer: A UNet-like transformer for efficient semantic segmentation of remote sensing urban scene imagery[J]. ISPRS Journal of Photogrammetry and Remote Sensing, 2022, 190: 196–214. doi: 10.1016/j.isprsjprs.2022.06.008. [27] 梁燕, 杨会林, 邵凯. 自适应特征选择的车路协同3D目标检测方案[J]. 电子与信息学报, 2025, 47(12): 5214–5225. doi: 10.11999/JEIT250601.LIANG Yan, YANG Huilin, and SHAO Kai. A vehicle-infrastructure cooperative 3D object detection scheme based on adaptive feature selection[J]. Journal of Electronics & Information Technology, 2025, 47(12): 5214–5225. doi: 10.11999/JEIT250601. -
下载:
下载: