Convolutional Neural Network and Vision Transformer-driven Cross-layer Multi-scale Fusion Network for Hyperspectral Image Classification

ZHAO Feng; GENG Miaomiao; LIU Hanqiang; ZHANG Junjie; YU Jun

doi:10.11999/JEIT231209

Volume 46 Issue 5

May 2024

Turn off MathJax

Article Contents

Article Navigation > Journal of Electronics & Information Technology > 2024 > 46(5): 2237-2248

ZHAO Feng, GENG Miaomiao, LIU Hanqiang, ZHANG Junjie, YU Jun. Convolutional Neural Network and Vision Transformer-driven Cross-layer Multi-scale Fusion Network for Hyperspectral Image Classification[J]. Journal of Electronics & Information Technology, 2024, 46(5): 2237-2248. doi: 10.11999/JEIT231209

Citation:

ZHAO Feng, GENG Miaomiao, LIU Hanqiang, ZHANG Junjie, YU Jun. Convolutional Neural Network and Vision Transformer-driven Cross-layer Multi-scale Fusion Network for Hyperspectral Image Classification[J]. Journal of Electronics & Information Technology, 2024, 46(5): 2237-2248. doi: 10.11999/JEIT231209

Citation:

ZHAO Feng, GENG Miaomiao, LIU Hanqiang, ZHANG Junjie, YU Jun. Convolutional Neural Network and Vision Transformer-driven Cross-layer Multi-scale Fusion Network for Hyperspectral Image Classification[J]. Journal of Electronics & Information Technology, 2024, 46(5): 2237-2248. doi: 10.11999/JEIT231209

PDF( 5399 KB)

Convolutional Neural Network and Vision Transformer-driven Cross-layer Multi-scale Fusion Network for Hyperspectral Image Classification

doi: 10.11999/JEIT231209 cstr: 32379.14.JEIT231209

1.
School of Communications and Information Engineering, Xi’an University of Posts and Telecommunications, Xi’an 710121, China
2.
School of Computer Science, Shaanxi Normal University, Xi’an 710119, China
3.
University of Science and Technology of China, Hefei 223700, China

Funds: The National Natural Science Foundation of China (62071379, 62071378, 62106196), The Youth Innovation Team of Shaanxi Universities

Received Date: 2023-11-01
Rev Recd Date: 2024-03-31

Available Online: 2024-04-18

Publish Date: 2024-05-30

Abstract

Abstract

HyperSpectral Image (HSI) classification is one of the most prominent research topics in geoscience and remote sensing image processing tasks. In recent years, the combination of Convolutional Neural Network (CNN) and vision transformer has achieved success in HSI classification tasks by comprehensively considering local-global information. Nevertheless, the ground objects of HSIs vary in scale, containing rich texture information and complex structures. The current methods based on the combination of CNN and vision transformer usually have limited capability to extract texture and structural information of multi-scale ground objects. To overcome the above limitations, a CNN and vision transformer-driven cross-layer multi-scale fusion network is proposed for HSI classification. Firstly, from the perspective of combining CNN and visual transformer, a cross-layer multi-scale local-global feature extraction module branch is constructed, which is composed of a convolution embedded vision transformer architecture and a cross-layer feature fusion module. Specifically, to enhance attention to multi-scale ground objects in HSIs, the convolution embedded vision transformer captures multi-scale local-global features effectively by organically combining multi-scale CNN and vision transformer. Furthermore, the cross-layer feature fusion module aggregates hierarchical multi-scale local-global features, thereby combining shallow texture information and deep structural information of ground objects. Secondly, a group multi-scale convolution module branch is designed to explore the potential multi-scale features from abundant spectral bands in HSIs. Finally, to mine local spectral details and global spectral information in HSIs, a residual group convolution module is designed to extract local-global spectral features. Experimental results on Indian Pines, Houston 2013, and Salinas Valley datasets confirm the effectiveness of the proposed method.
- HyperSpectral Image (HSI) classification,
- Convolutional Neural Network (CNN),
- Vision transformer,
- Multi-scale features,
- Fusion network

FullText(HTML)

References(25)

References

[1]	BIOUCAS-DIAS J M, PLAZA A, CAMPS-VALLS G, et al. Hyperspectral remote sensing data analysis and future challenges[J]. IEEE Geoscience and Remote Sensing Magazine, 2013, 1(2): 6–36. doi: 10.1109/MGRS.2013.2244672.
[2]	KHAN I H, LIU Haiyan, LI Wei, et al. Early detection of powdery mildew disease and accurate quantification of its severity using hyperspectral images in wheat[J]. Remote Sensing, 2021, 13(18): 3612. doi: 10.3390/rs13183612.
[3]	SUN Mingyue, LI Qian, JIANG Xuzi, et al. Estimation of soil salt content and organic matter on arable land in the yellow river delta by combining UAV hyperspectral and landsat-8 multispectral imagery[J]. Sensors, 2022, 22(11): 3990. doi: 10.3390/s22113990.
[4]	STUART M B, MCGONIGLE A J S, and WILLMOTT J R. Hyperspectral imaging in environmental monitoring: A review of recent developments and technological advances in compact field deployable systems[J]. Sensors, 2019, 19(14): 3071. doi: 10.3390/s19143071.
[5]	BAZI Y and MELGANI F. Toward an optimal SVM classification system for hyperspectral remote sensing images[J]. IEEE Transactions on Geoscience and Remote Sensing, 2006, 44(11): 3374–3385. doi: 10.1109/TGRS.2006.880628.
[6]	GU Yanfeng, CHANUSSOT J, JIA Xiuping, et al. Multiple kernel learning for hyperspectral image classification: A review[J]. IEEE Transactions on Geoscience and Remote Sensing, 2017, 55(11): 6547–6565. doi: 10.1109/TGRS.2017.2729882.
[7]	LICCIARDI G A and CHANUSSOT J. Nonlinear PCA for visible and thermal hyperspectral images quality enhancement[J]. IEEE Geoscience and Remote Sensing Letters, 2015, 12(6): 1228–1231. doi: 10.1109/LGRS.2015.2389269.
[8]	ROY S K, KRISHNA G, DUBEY S R, et al. HybridSN: Exploring 3-D–2-D CNN feature hierarchy for hyperspectral image classification[J]. IEEE Geoscience and Remote Sensing Letters, 2020, 17(2): 277–281. doi: 10.1109/LGRS.2019.2918719.
[9]	GONG Zhiqiang, ZHONG Ping, YU Yang, et al. A CNN with multiscale convolution and diversified metric for hyperspectral image classification[J]. IEEE Transactions on Geoscience and Remote Sensing, 2019, 57(6): 3599–3618. doi: 10.1109/TGRS.2018.2886022.
[10]	MENG Zhe, LI Lingling, JIAO Licheng, et al. Fully dense multiscale fusion network for hyperspectral image classification[J]. Remote Sensing, 2019, 11(22): 2718. doi: 10.3390/rs11222718.
[11]	ZHU Minghao, JIAO Licheng, LIU Fang, et al. Residual spectral–spatial attention network for hyperspectral image classification[J]. IEEE Transactions on Geoscience and Remote Sensing, 2021, 59(1): 449–462. doi: 10.1109/TGRS.2020.2994057.
[12]	MENG Zhe, JIAO Licheng, LIANG Miaomiao, et al. A lightweight spectral-spatial convolution module for hyperspectral image classification[J]. IEEE Geoscience and Remote Sensing Letters, 2022, 19: 5505105. doi: 10.1109/LGRS.2021.3069202.
[13]	刘娜, 李伟, 陶然. 图信号处理在高光谱图像处理领域的典型应用[J]. 电子与信息学报, 2023, 45(5): 1529–1540. doi: 10.11999/JEIT220887. LIU Na, LI Wei, and TAO Ran. Typical application of graph signal processing in hyperspectral image processing[J]. Journal of Electronics & Information Technology, 2023, 45(5): 1529–1540. doi: 10.11999/JEIT220887.
[14]	HONG Danfeng, GAO Lianru, YAO Jing, et al. Graph convolutional networks for hyperspectral image classification[J]. IEEE Transactions on Geoscience and Remote Sensing, 2021, 59(7): 5966–5978. doi: 10.1109/TGRS.2020.3015157.
[15]	DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16x16 words: Transformers for image recognition at scale[C/OL]. 9th International Conference on Learning Representations, 2021. https://arxiv.org/abs/2010.11929v1.
[16]	HONG Danfeng, HAN Zhu, YAO Jing, et al. SpectralFormer: Rethinking hyperspectral image classification with transformers[J]. IEEE Transactions on Geoscience and Remote Sensing, 2022, 60: 5518615. doi: 10.1109/TGRS.2021.3130716.
[17]	REN Qi, TU Bing, LIAO Sha, et al. Hyperspectral image classification with IFormer network feature extraction[J]. Remote Sensing, 2022, 14(19): 4866. doi: 10.3390/rs14194866.
[18]	SUN Le, ZHAO Guangrui, ZHENG Yuhui, et al. Spectral-spatial feature tokenization transformer for hyperspectral image classification[J]. IEEE Transactions on Geoscience and Remote Sensing, 2022, 60: 5522214. doi: 10.1109/TGRS.2022.3144158.
[19]	MEI Shaohui, SONG Chao, MA Mingyang, et al. Hyperspectral image classification using group-aware hierarchical transformer[J]. IEEE Transactions on Geoscience and Remote Sensing, 2022, 60: 5539014. doi: 10.1109/TGRS.2022.3207933.
[20]	ZHANG Junjie, MENG Zhe, ZHAO Feng, et al. Convolution transformer mixer for hyperspectral image classification[J]. IEEE Geoscience and Remote Sensing Letters, 2022, 19: 6014205. doi: 10.1109/LGRS.2022.3208935.
[21]	ZHAO Feng, LI Shijie, ZHANG Junjie, et al. Convolution transformer fusion splicing network for hyperspectral image classification[J]. IEEE Geoscience and Remote Sensing Letters, 2023, 20: 5501005. doi: 10.1109/LGRS.2022.3231874.
[22]	LIU Na, LI Wei, SUN Xian, et al. Remote sensing image fusion with task-inspired multiscale nonlocal-attention network[J]. IEEE Geoscience and Remote Sensing Letters, 2023, 20: 5502505. doi: 10.1109/LGRS.2023.3254049.
[23]	YANG Jiaqi, DU Bo, and WU Chen. Hybrid vision transformer model for hyperspectral image classification[C]. IGARSS 2022 - 2022 IEEE International Geoscience and Remote Sensing Symposium, Kuala Lumpur, Malaysia, 2022: 1388–1391. doi: 10.1109/IGARSS46834.2022.9884262.
[24]	SANDLER M, HOWARD A, ZHU Menglong, et al. MobileNetV2: Inverted residuals and linear bottlenecks[C]. The IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, USA, 2018: 4510–4520. doi: 10.1109/CVPR.2018.00474.
[25]	WANG Qilong, WU Banggu, ZHU Pengfei, et al. ECA-Net: Efficient channel attention for deep convolutional neural networks[C]. The IEEE Conference on Computer Vision and Pattern Recognition, Seattle, USA, 2020: 11531–11539. doi: 10.1109/CVPR42600.2020.01155.