Remote Sensing Land-Cover Classification Combining Multi-Modal and Multi-Scale Fusion with Mamba
-
摘要: 遥感成像技术的迅速发展为遥感地物分类带来了海量且多元的数据,如何利用多模态数据的互补性提升分类性能成为研究热点。近年来Mamba模型凭借其独特的架构与强大的全局建模能力在图像处理领域得到成功应用,其中多尺度视觉Mamba模型善于应对复杂的空间分布,契合遥感地物尺度差异大、朝向复杂等特点。为充分发挥Mamba模型提取与融合遥感数据特征的优势,该文提出基于Mamba的多模态多尺度融合模型用于遥感地物分类(M3RS)。首先,该模型采用多尺度空间编码器提取光探测与测距(LiDAR)图像和合成孔径雷达(SAR)图像的特征,并基于高光谱图像(HSI)独特的数据结构,提出多尺度空谱编码器提取其复杂的空间光谱特征。然后提出由交叉Mamba和通道拼接Mamba相结合得到的多模态特征融合模块,其中交叉Mamba通过交互状态空间参数高效融合多模态空间特征,通道拼接Mamba通过构造四种通道扫描方式充分融合多模态特征。最后该模型采用改进的多尺度特征融合模块逐层融合多尺度特征并提取具有高判别性的分类依据,可有效提升遥感地物分类的准确率。该文通过在Muufl、Houston2013和Augsburg三个数据集上展开的分类实验验证了该分类模型M3RS的有效性。Abstract:
Objective The rapid development of remote sensing imaging technology has generated massive and diverse data for Remote Sensing Land-Cover Classification. In recent years, Mamba-based models have achieved successful applications in image processing owing to their distinctive architectures and powerful global modeling capabilities. Among them, multi-scale vision Mamba models demonstrate proficiency in handling complex spatial distributions, which aligns well with the characteristics of remote sensing scenes, including significant scale variations and complex orientations of ground objects. To fully exploit the advantages of the Mamba models in extracting and fusing features from remote sensing data, The Mamba-based Multi-Modal and Multi-Scale Fusion Model for Remote Sensing Land-Cover Classification (M3RS) is proposed. Methods The proposed model, M3RS, mainly consists of three stages to extract and fuse features. Firstly, the model employs a Multi-Scale Spatial Encoder based on Spatial Mamba to extract features from Light Detection And Ranging (LiDAR) images and Synthetic Aperture Radar(SAR) images. Due to the unique data structure of the HyperSpectral Image(HSI), a Multi-Scale Spatio-Spectral Encoder is proposed to extract the complex spatial-spectral features using Spatial Mamba and Spectral Mamba. Next, a Multi-Modal Feature Fusion Module including the proposed Cross-Mamba and Channel-Concatenated Mamba is introduced to fuse multimodal features. Cross-Mamba efficiently fuses multimodal spatial features by interacting with multimodal state space parameters, while Channel-Concatenated Mamba fully fuses multimodal features by constructing four channel scanning methods. Finally, the model adopts an improved Multi-Scale Feature Fusion Module to fuse multiscale features layer by layer, thereby obtaining highly discriminative classification evidence that can effectively improve the accuracy of Remote Sensing Land-Cover Classification. Results and Discussions Comparative experiments are conducted on three publicly available multimodal remote sensing land-cover classification datasets to evaluate the classification performance of the proposed model against seven mainstream models. The experimental results demonstrate that the proposed model significantly outperforms its counterparts in terms of Overall Accuracy (OA), Average Accuracy (AA), and Kappa coefficient. Specifically, on the Muufl dataset, the OA of the proposed model is 3.49%, 3.80%, and 4.02% higher than those of models based on CNN, Transformer and Mamba, respectively ( Table. 2 ,Fig. 8 ). Furthermore, on the Houston2013 and Augsburg datasets, the OA of the proposed model surpasses all comparative algorithms by an average of 3.37% and 3.11%, respectively (Table. 3 ,Table. 4 ). The results indicate that the integration of a Multi-Modal Multi-Scale architecture with the Mamba model effectively enhances the accuracy of Remote Sensing Land-Cover Classification. In addition, an ablation experiment in this paper validates the contribution of each proposed module to improving classification accuracy (Table. 5 ). While Spectral Mamba significantly improves the accuracy, several fusion modules also make contributions to the overall performance to different degrees. Then, the hyperparameter experiment offers valuable hyperparameter configurations for multiscale remote sensing image fusion (Table. 6 ). Finally, compared with the Transformer model employing an identical multi-scale architecture, this Mamba model not only achieves improved classification accuracy but also reduces the parameter count by 37.4% and shortens the training time by 10.7%, reflecting the dual improvements in both accuracy and efficiency of the Mamba model (Fig. 9 ).Conclusions The proposed M3RS employs the Mamba model to fuse multimodal and multiscale features, effectively enhancing the performance of Remote Sensing Land-Cover Classification. Firstly, different encoders utilized in M3RS effectively address the disparities among multimodal data, thereby providing richer multimodal complementary information for fusion and classification. Subsequently, the proposed Cross-Mamba and Channel-Concatenation Mamba take the similarities and differences between Mamba and Transformer into account and respectively achieve efficient multimodal spatial feature interaction and comprehensive multimodal feature fusion, providing a hierarchical multimodal fusion approach. Moreover, the multiscale architecture overcomes the complex spatial distribution issues of remote sensing land covers, to a certain extent. And the proposed Multi-Scale Feature Fusion Module composed of Spatial Mamba and channel attention effectively integrates multiscale features and provides a reliable basis for the following classification. Based on this work, future research will continue to optimize the model by exploring the underlying principles of Mamba and conduct in-depth investigation into cross-attention mechanisms to refine the feature alignment process in multimodal interaction and ensure the reliability of feature fusion. -
表 1 Muufl数据集分类结果对比(%)
类别(训练/测试) CCRNet MFT ExVit HCT M2FNet Cross-HL MSFMamba M3RS 1:树木(150/ 23096 )89.97 87.90 89.59 92.20 90.92 91.39 88.51 92.02 2:草地(150/ 4120 )79.30 75.27 79.42 76.07 72.84 85.70 82.14 87.26 3:混合地表(150/ 6732 )80.50 76.00 79.63 84.51 82.13 84.21 80.21 85.01 4:土壤沙地(150/ 1676 )94.09 94.45 95.29 96.78 96.30 96.96 94.57 97.85 5:公路(150/ 6537 )87.33 78.75 76.89 89.02 85.07 90.12 87.30 93.21 6:水体(150/316) 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 7:建筑阴影(150/ 2083 )87.47 91.55 89.34 88.57 92.89 91.21 90.64 92.17 8:建筑(150/ 6090 )96.31 95.14 93.35 97.11 94.98 95.53 95.50 97.34 9:人行道(150/ 1235 )77.17 60.32 66.64 76.44 73.28 82.02 72.06 85.26 10:黄色路缘(150/33) 96.97 93.94 100.00 90.91 100.00 100.00 93.94 96.97 11:防护布(150/119) 97.48 99.16 99.16 99.16 97.48 99.16 99.16 99.16 OA 88.12 84.86 86.06 89.79 88.00 90.36 87.59 91.61 AA 89.69 86.59 88.12 90.07 89.63 92.39 89.46 93.30 Kappa 84.47 80.34 81.82 86.57 84.27 87.33 83.81 88.96 表 2 Houston2013数据集分类结果对比(%)
类别(训练/测试) CCRNet MFT ExVit HCT M2FNet Cross-HL MSFMamba M3RS 1:健康草地(198/ 1053 )72.74 76.54 79.39 75.50 82.28 76.54 80.06 76.92 2:压力草地(190/ 1064 )83.93 93.33 77.91 95.96 87.41 85.15 98.59 96.15 3:人工草地(192/505) 91.88 98.02 97.82 98.22 95.25 97.82 96.63 88.32 4:树木(188/ 1056 )89.30 89.96 87.97 81.72 92.90 88.92 93.28 89.68 5:土壤(186/ 1056 )100.00 99.91 99.62 100.00 98.01 100.00 100.00 100.00 6:水体(182/143) 95.80 95.80 95.80 95.80 95.80 95.8 100.00 100.00 7:住宅区(196/ 1072 )72.48 82.65 87.03 71.83 78.92 76.77 85.63 73.13 8:商业区(191/ 1053 )84.43 77.40 96.68 93.54 92.21 74.55 93.45 94.4 9:道路(193/ 1059 )84.32 88.20 79.89 89.52 81.49 77.43 78.00 83.95 10:高速公路(193/ 1059 )63.71 70.85 65.54 66.51 78.09 68.53 59.65 96.81 11:铁路(181/ 1054 )99.15 93.74 95.64 96.39 97.91 96.11 83.11 96.02 12:停车场1(192/ 1041 )97.50 98.17 98.56 98.75 94.43 100.00 95.00 99.52 13:停车场2(184/285) 70.88 76.14 76.84 84.56 82.81 72.28 82.46 80.70 14:网球场(181/247) 100.00 100.00 100.00 100.00 99.19 100.00 100.00 100.00 15:跑道(187/473) 99.79 99.37 99.15 100.00 100.00 100.00 100.00 100.00 OA 85.75 88.13 87.91 88.26 89.28 85.73 87.97 90.95 AA 87.06 89.34 89.19 89.89 90.46 87.33 89.72 91.71 Kappa 84.52 87.10 86.87 87.25 88.36 84.49 86.94 90.17 表 3 Augsburg数据集分类结果对比(%)
类别(训练/测试) CCRNet MFT ExVit HCT M2FNet Cross-HL MSFMamba M3RS 1:森林(146/ 13361 )93.51 88.90 93.65 96.16 94.78 92.44 94.93 95.39 2:住宅区(264/ 30065 )99.04 97.43 95.92 99.24 97.38 97.95 99.49 99.01 3:工业区(21/ 3830 )66.11 33.99 40.26 38.22 21.57 61.85 3.66 69.43 4:低矮植物(248/ 26609 )92.37 87.50 91.68 93.21 91.09 89.97 94.51 92.82 5:待开发地(52/523) 61.95 51.05 48.95 64.63 37.48 53.73 30.59 58.51 6:商业区(7/ 1638 )9.52 12.76 14.22 5.19 1.89 3.72 0.18 7.63 7:水域(23/ 1507 )48.97 37.62 17.39 47.91 11.75 51.09 13.01 48.64 OA 91.05 86.15 87.75 90.41 86.94 89.28 88.02 91.62 AA 67.35 58.47 57.44 63.51 50.85 64.39 48.05 67.35 Kappa 87.17 80.05 82.26 86.04 80.75 84.68 82.04 87.94 表 4 Houston2013数据集上的消融实验(%)
模块 OA AA Kappa 仅空间Mamba 86.87 89.00 85.73 添加谱Mamba 89.07 90.75 88.13 添加交叉Mamba 89.28 90.68 88.36 添加通道拼接Mamba 90.14 91.47 89.29 添加多尺度特征融合模块 90.95 91.71 90.17 表 5 Houston2013数据集上空间Mamba层数和特征维度的超参数实验(%)
VSSBlock层数 特征维度 OA AA Kappa (2,2,9) (64,128,256) 90.95 91.71 90.17 (2,2,9,2) (64,128,256,512) 89.52 91.00 88.62 (2,2,27) (64,128,256) 89.88 91.35 89.01 (2,2,9) (96,192,384) 87.29 88.85 86.20 -
[1] 李树涛, 李聪妤, 康旭东. 多源遥感图像融合发展现状与未来展望[J]. 遥感学报, 2021, 25(1): 148–166. doi: 10.11834/jrs.20210259.LI Shutao, LI Congyu, and KANG Xudong. Development status and future prospects of multi-source remote sensing image fusion[J]. National Remote Sensing Bulletin, 2021, 25(1): 148–166. doi: 10.11834/jrs.20210259. [2] HANG Renlong, LI Zhu, GHAMISI P, et al. Classification of hyperspectral and LiDAR data using coupled CNNs[J]. IEEE Transactions on Geoscience and Remote Sensing, 2020, 58(7): 4939–4950. doi: 10.1109/TGRS.2020.2969024. [3] REN Bo, HUA Chaoyue, HOU Biao, et al. PDCNet: A Polarimetric data-enhanced contrastive learning network for PolSAR land cover classification[J]. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2025, 18: 10010–10025. doi: 10.1109/JSTARS.2025.3557252. [4] REN Bo, WANG Zhao, GE Hanyuan, et al. Incremental land cover classification via soft label and subregion distillation[J]. IEEE Transactions on Geoscience and Remote Sensing, 2025, 63: 5647322. doi: 10.1109/TGRS.2025.3615670. [5] LI Shutao, SONG Weiwei, FANG Leyuan, et al. Deep learning for hyperspectral image classification: An overview[J]. IEEE Transactions on Geoscience and Remote Sensing, 2019, 57(9): 6690–6709. doi: 10.1109/TGRS.2019.2907932. [6] MA Xianping, ZHANG Xiaokang, and PUN M Q. RS3Mamba: Visual state space model for remote sensing image semantic segmentation[J]. IEEE Geoscience and Remote Sensing Letters, 2024, 21: 6011405. doi: 10.1109/LGRS.2024.3414293. [7] 刘晓敏, 余梦君, 乔振壮, 等. 面向多源遥感数据分类的尺度自适应融合网络[J]. 电子与信息学报, 2024, 46(9): 3693–3702. doi: 10.11999/JEIT240178.LIU Xiaomin, YU Mengjun, QIAO Zhenzhuang, et al. Scale adaptive fusion network for multimodal remote sensing data classification[J]. Journal of Electronics & Information Technology, 2024, 46(9): 3693–3702. doi: 10.11999/JEIT240178. [8] 廖帝灵, 赖涛, 黄海风, 等. LightMamba: 一种轻量级Mamba用于高光谱图形和激光雷达数据联合分类网络[J]. 电子与信息学报, 2025, 47(12): 4937–4947. doi: 10.11999/JEIT250981.LIAO Diling, LAI Tao, HUANG Haifeng, et al. LightMamba: A lightweight mamba network for the joint classification of HSI and LiDAR data[J]. Journal of Electronics & Information Technology, 2025, 47(12): 4937–4947. doi: 10.11999/JEIT250981. [9] LAPARRA V, MALO J, and CAMPS-VALLS G. Dimensionality reduction via regression in hyperspectral imagery[J]. IEEE Journal of Selected Topics in Signal Processing, 2015, 9(6): 1026–1036. doi: 10.1109/JSTSP.2015.2417833. [10] MELGANI F and BRUZZONE L. Support vector machines for classification of hyperspectral remote-sensing images[C]. IEEE International Geoscience and Remote Sensing Symposium, Toronto, Canada, 2002: 506–508. doi: 10.1109/IGARSS.2002.1025088. [11] ZHOU Hao, LUO Fulin, ZHUANG Huiping, et al. Attention multihop graph and multiscale convolutional fusion network for hyperspectral image classification[J]. IEEE Transactions on Geoscience and Remote Sensing, 2023, 61: 5508614. doi: 10.1109/TGRS.2023.3265879. [12] ZHAO Linying and JI Shunping. CNN, RNN, or VIT? An evaluation of different deep learning architectures for spatio-temporal representation of sentinel time series[J]. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2023, 16: 44–56. doi: 10.1109/JSTARS.2022.3219816. [13] LU Ting, DING Kexin, FU Wei, et al. Coupled adversarial learning for fusion classification of hyperspectral and LiDAR data[J]. Information Fusion, 2023, 93: 118–131. doi: 10.1016/j.inffus.2022.12.020. [14] XU Xiaodong, LI Wei, RAN Qiong, et al. Multisource remote sensing data classification based on convolutional neural network[J]. IEEE Transactions on Geoscience and Remote Sensing, 2018, 56(2): 937–949. doi: 10.1109/TGRS.2017.2756851. [15] WANG Jinzhe, ZHANG Junping, GUO Qingle, et al. WANG Jinzhe, ZHANG Junping, GUO Qingle, et al. Fusion of hyperspectral and LiDAR data based on dual-branch convolutional neural network[C]. Proceedings of the 2019 IEEE International Geoscience and Remote Sensing Symposium, Yokohama, Japan, 2019: 3388–3391. doi: 10.1109/IGARSS.2019.8899332. [16] WU Xin, HONG Danfeng, and CHANUSSOT J. Convolutional neural networks for multimodal remote sensing data classification[J]. IEEE Transactions on Geoscience and Remote Sensing, 2022, 60: 5517010. doi: 10.1109/TGRS.2021.3124913. [17] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]. Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, USA, 2017: 6000–6010. [18] DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16 × 16 words: Transformers for image recognition at scale[C]. Proceedings of the 9th International Conference on Learning Representations, 2021. (查阅网上资料, 未找到对应的出版地及页码信息, 请确认补充). [19] XUE Zhixiang, TAN Xiong, YU Xuchu, et al. Deep hierarchical vision transformer for hyperspectral and LiDAR data classification[J]. IEEE Transactions on Image Processing, 2022, 31: 3095–3110. doi: 10.1109/TIP.2022.3162964. [20] ROY S K, DERIA A, HONG Danfeng, et al. Multimodal fusion transformer for remote sensing image classification[J]. IEEE Transactions on Geoscience and Remote Sensing, 2023, 61: 5515620. doi: 10.1109/TGRS.2023.3286826. [21] YAO Jing, ZHANG Bing, LI Chenyu, et al. Extended Vision Transformer (ExViT) for land use and land cover classification: A multimodal deep learning framework[J]. IEEE Transactions on Geoscience and Remote Sensing, 2023, 61: 5514415. doi: 10.1109/TGRS.2023.3284671. [22] ZHAO Guangrui, YE Qiaolin, SUN Le, et al. Joint classification of hyperspectral and LiDAR data using a hierarchical CNN and transformer[J]. IEEE Transactions on Geoscience and Remote Sensing, 2023, 61: 5500716. doi: 10.1109/TGRS.2022.3232498. [23] ROY S K, SUKUL A, JAMALI A, et al. Cross hyperspectral and LiDAR attention transformer: An extended self-attention for land use and land cover classification[J]. IEEE Transactions on Geoscience and Remote Sensing, 2024, 62: 5512815. doi: 10.1109/TGRS.2024.3374324. [24] SUN Le, WANG Xinyu, ZHENG Yuhui, et al. Multiscale 3-D–2-D mixed CNN and lightweight attention-free transformer for hyperspectral and LiDAR classification[J]. IEEE Transactions on Geoscience and Remote Sensing, 2024, 62: 2100116. doi: 10.1109/TGRS.2024.3367374. [25] SMITH J T H, WARRINGTON A, and LINDERMAN S W. Simplified state space layers for sequence modeling[C]. Proceedings of the 11th International Conference on Learning Representations, Kigali, Rwanda, 2023: 1–13. [26] GU A and DAO T. Mamba: Linear-time sequence modeling with selective state spaces[EB/OL]. https://arxiv.org/abs/2312.00752, 2024. [27] ZHU Lianghui, LIAO Bencheng, ZHANG Qian, et al. Vision mamba: Efficient visual representation learning with bidirectional state space model[C]. Proceedings of the 41st International Conference on Machine Learning, Vienna, Austria, 2024. [28] LIU Yue, TIAN Yunjie, ZHAO Yuzhong, et al. VMamba: Visual state space model[C]. Proceedings of the 38th International Conference on Neural Information Processing Systems, Vancouver, Canada, 2024: 3273. [29] CHEN Keyan, CHEN Bowen, LIU Chenyang, et al. RSMamba: Remote sensing image classification with state space model[J]. IEEE Geoscience and Remote Sensing Letters, 2024, 21: 8002605. doi: 10.1109/LGRS.2024.3407111. [30] LIAO Diling, WANG Qingsong, LAI Tao, et al. Joint classification of hyperspectral and LiDAR data based on mamba[J]. IEEE Transactions on Geoscience and Remote Sensing, 2024, 62: 5530915. doi: 10.1109/TGRS.2024.3459709. [31] GAO Feng, JIN Xuepeng, ZHOU Xiaowei, et al. MSFMamba: Multiscale feature fusion state space model for multisource remote sensing image classification[J]. IEEE Transactions on Geoscience and Remote Sensing, 2025, 63: 5504116. doi: 10.1109/TGRS.2025.3535622. [32] 刁文辉, 龚铄, 辛林霖, 等. 针对多模态遥感数据的自监督策略模型预训练方法[J]. 电子与信息学报, 2025, 47(6): 1658–1668. doi: 10.11999/JEIT241016.DIAO Wenhui, GONG Shuo, XIN Linlin, et al. A model pre-training method with self-supervised strategies for multimodal remote sensing data[J]. Journal of Electronics & Information Technology, 2025, 47(6): 1658–1668. doi: 10.11999/JEIT241016. -
下载:
下载: